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Given a set of several inputs into a system (e.g., independent variables characterizing stimuli) and a set of several stochastically non-independent 
outputs (e.g., random variables describing different aspects of responses), how can one determine, for each of the outputs, which of the inputs it 
is influenced by? The problem has applications ranging from modeling pairwise comparisons to reconstructing mental processing architectures 
to conjoint testing. A necessary and sufficient condition for a given pattern of selective influences is provided by the Joint Distribution Criterion, 
according to which the problem of "what influences what" is equivalent to that of the existence of a joint distribution for a certain set of random 
variables. For inputs and outputs with finite sets of values this criterion translates into a test of consistency of a certain system of linear equations 
and inequalities (Linear Feasibility Test) which can be performed by means of linear programming. The Joint Distribution Criterion also leads to a 
metatheoretical principle for generating a broad class of necessary conditions (tests) for diagrams of selective influences. Among them is the class 
of distance-type tests based on the observation that certain functionals on jointly distributed random variables satisfy triangle inequality. 

Keywords: conjoint testing, external factors, joint distribution, probabilistic causality, mental architectures, metrics on random variables, 
random outputs, selective influence, stochastic dependence, Thurstonian scaling. 




1. INTRODUCTION 

This paper presents a general methodology of dealing with 
diagrams of selective influences, like this one; 



(1) 



The Greek letters in this diagram represent inputs, or external 
factors, e.g., parameters of stimuli whose values can be chosen 
at will, or randomly vary but can be observed. The capital Ro- 
man letters stand for random outputs characterizing reactions of 
the system (an observer, a group of observers, a technical device, 
etc.). The arrows show which factor influences which random 
output. The factors are treated as deterministic entities: even if 
a, p,y, 8 in reality vary randomly (e.g., being randomly gener- 
ated by a computer program, or being concomitant parameters 
of observations, such as age of respondents), for the purposes 
of analyzing selective influences the random outputs A,B,C are 
always viewed as conditioned upon various combinations of spe- 
cific values of a, p,Y, 5. 

The first question to ask is: what is the meaning of the above 
diagram if the random outputs A,B,C in it are not necessar- 
ily stochastically independent? (If they are, the answer is of 
course trivial.) And once the meaning of the diagram of selec- 
tive influences is established, how can one determine that this 
diagram correctly characterizes the dependence of the joint dis- 
tributions of the random outputs A, B,C on the external factors 



a, P,Y, 5? These questions are important, because the assump- 
tion of stochastic independence of the outputs more often than 
not is either demonstrably false or adopted for expediency alone, 
with no other justification, while the assumption of selectivity in 
causal relations between inputs and stochastic outputs is ubiq- 
uitous in theoretical modeling, often being built in the very lan- 
guage of the models. 



1.1. An illustration: Pairwise comparisons 

Consider Thurstone's most general model of pairwise com- 
parisons (Thurstone, 1927).' This model is predicated on the 
diagram 




(2) 



where (A,B) are bivariate normally distributed random vari- 
ables, and a, p are two stimuli being compared. The stimuli are 
identified by their "observation areas" (Dzhafarov, 2002): say, 
the label a may stand for "chronologically first" or "located to 
the left from fixation point," and the label P for, respectively, 
"chronologically second" or "located to the right from fixation 
point." For our present purposes, a and p are external factors 
with varying values (e.g., light intensity in, respectively, first 
and second observation areas). The random variables A and B 
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This model is known as Thurstonian Cases 1 and 2. The only difference be- 
tween the two is that in Case 1 the responding system is an individual observer 
to whom pairs of stimuli are presented repeatedly, while in Case 2 the respond- 
ing system is a group of people each responding to every pair of stimuli once. 
One can, of course, think of all kinds of mixed or intermediate situations. 
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are supposed to represent some unidimensional property (say, 
brightness) of the images of, respectively, the stimuli a and P 
(the emphasized word "respectively" indicating selectiveness). 
According to the model, the probability with which a is judged 
to have less of the property in question than P equals Pr [A < B]. 
The problem is: what restrictions should be imposed in this the- 
oretical scheme on the bivariate-normal distribution of A,B to 
ensure that A is an image of the stimulus a alone and B is an im- 
age of the stimulus P alone, as opposed to both or either of them 
being an image of both the stimuli a and P? In other words, 
how can one distinguish, within the framework of Thurstone's 
general model, the diagram of selective influences (|2| from the 
diagrams 



(3) 



Denoting by A {x,y) ,B {x,y) the two random variables at the val- 
ues of the factors (a, p),^ intuition tells us that one should 
be able to write 

A{x,y)=A{x),B{x,y)^B{y) 

if the diagram ^ holds, but not in the case of the diagrams [3] 
Clearly then, one should require that 





E [A {x,y)] = fiA {x) , Var [A {x,y)] = Oaa (x) , 
E [B {x,y)] = {y) , Var [B {x,y)] = Obb {y) , 



(4) 



with the obvious notation for the parameters of the two distri- 
butions. These equations form an instance of what is called 
marginal selectivity (the notion introduced in Townsend & 
Schweickert, 1989) in the dependence of {A,B) on (a, p); sepa- 
rately taken, the distribution of A (here, normal) does not depend 
on P, nor the distribution of B on a. The problem is, however, 
in dealing with the covariance Cov [A {x,y) ,B {x,y)]. If it is zero 
for all x,y (i.e., A and B are always stochastically independent), 
the marginal selectivity is all one needs to speak of a selectively 
causing A and p selectively causing B. In general, however, the 
covariance depends on both x and y, 

Cov[A{x,y) ,B{x,y)] ^aAB{x,y). 

It would be unsatisfactory to simply ignore stochastic inter- 
dependence among random variables and focus on marginal se- 
lectivity alone. It will be shown in Section 13.31 that marginal 
selectivity is too weak a concept to allow one to write A {x,y) = 
A (x) ,B{x,y) = B (y), because A (x) generally does not preserve 
its identity (is not the same random variable) under different y, 



^ It may seem unnecessary to use separate notation for factors and their values 
(levels), but it is in fact more convenient in view of the formal treatment pre- 
sented below. The factors there are defined as sets of "factor points," and the 
latter are defined as factor values associated with particular factor names: e.g., 
{x, 'a' ) is a factor point of factor a. 



and analogously for B {y) under different x. So one needs to 
answer the conceptual question: under what forms of the depen- 
dence of Oab on {x,y) can one say that the diagram ^ is correct? 
Even in the seemingly simple special cases one cannot reply on 
one's common sense alone. Thus, if Oab {x,y) = Oab (x), what 
does this tell us about the selectiveness? Even simpler: what 
can one conclude if one finds out that GAB{x,y) = const ^ 
across all x^yl After all, if Oab is a constant, other measures of 
stochastic interdependence will be functions of both x and y. For 
instance, the correlation coefficient then is 



Cor[A{x,y),B{x,y)\ 



const 



V '^AA {x) Obb (y) 



P (x,y) . 



One might be tempted to adopt a radical solution: to always at- 
tribute each of A and B to both a and p (i.e., deny any selective- 
ness), unless A and B are stochastically independent and exhibit 
marginal selectivity. But a simple example will show that such 
an approach would be far too restrictive to be useful. 

Consider the model in which the observer can be in one of 
two states of attention, or activation, called "attentive" and "inat- 
tentive," with probabilities p and 1 — p, respectively. When in 
the inattentive state, the stimuli a, p (with respective values x,y) 
cause independent normally distributed images A (x) ,B{y), with 
parameters 

E[A(x)] =0, Var[A(x)] = 1, 

E[B(.y)]=0, Var[B(.y)]-l. 

That is, in the inattentive state the distribution of the images 
does not depend on the stimuli at all. When in the attentive 
state, A (x) ,B{y) remain independent and normally distributed, 
but their parameters change as 

E[A{x)]^^a{x), Var[A(x)] = l, 

E[B{y)]^f,B{y), Var[BCy)] = l. 

We note that, first, A and B are stochastically independent in ei- 
ther state of attention; second, that A does not depend on p and B 
does not depend on a in either state of attention; and third, that 
the switches from one attention state to another do not depend 
on the stimuli at all. It is intuitively clear then that the causality 
is selective here, in conformity with the diagram|2l But the over- 
all distribution of A, B in this example (a mixture of two bivariate 
normal distributions), while obviously satisfying marginal selec- 
tivity, has 

Cov [A {x,y) ,B{x,y)] = p {1 - p)pA {x)pb (y) ^ 0. 

In the theory of selectiveness presented later in this paper it is 
easily proved that in this situation A only depends on a and B 
on p, in spite of their stochastic interdependence (see Example 
|231l. 

It is instructive to see that if one ignores the issue of selec- 
tiveness and formulates Thurstone's general model as Thurstone 
did it himself, with no restrictions imposed on the covariance 
OAB{x,y), the model becomes redundant and unfalsifiable, not 
just with respect to a finite matrix of data, but for any theoretical 
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probability function 

p{x,y) ^Pv[A{x,y) <B{x,y)] 



:<I> 



\J Oaa (-v)+obb ( v) +2oab (-V,)') 



(5) 



where <I> is the standard normal integral. Denoting zix^y) 
O^' {p {x^y)), let /j^ (x) and /jg (y) be any functions such that 



^iA{x)-^B{y) 



<M, 



for some M. Then, putting Oaa (jc) = CJbb (y) = /2, one can 
always find the covariance OAB(x,y) to satisfy (|5]). On a mo- 
ment's reflection, this is what one should expect: without the 
assumption of selective influences Thurstone's general model 
is essentially the same as the vacuous "model" in which stim- 
uli a and p evoke a single normally distributed random vari- 
able D{x,y) (interpretable as "subjective difference" between 
the value x of a and the value y of p), with the decision rule 
"say that p exceeds a (in a given respect) if D {x,y) < 0, other- 
wise say that a exceeds p." 

The importance of having a principled way of selectively 
attributing stochastic images to stimuli they represent is even 
more obvious in the context of the Thurstonian-type models ap- 
plied to same-different rather than greater-less judgments (Dzha- 
farov, 2002). When combined with another constraint, called the 
"well-behavedness" of the random variables representing stim- 
uh, the notion of selective influences has been shown to impose 
highly non-obvious constraints on the minima of discrimination 
functions and the relationship "x of a is the best match for y of 
P" (for details, see Dzhafarov, 2003b-c, 2006; Kujala & Dzha- 
farov, 2009) 



1.2. History and related notions 

Historically, the notion of selective probabilistic causality was 
introduced in psychology by Sternberg (1969), in the context of 
the reconstruction of "stages" of mental processing. If a and 
P are certain experimental manipulations (say, size of memory 
lists and legibility of items, respectively), and if A and B are du- 
rations of two hypothetical stages of processing (say, memory 
search and perception, respectively), then one can hope to test 
this hypothesis (that memory search and perception are indeed 
two stages, processes occurring one after another) only if one 
assumes that A is selectively influenced by a and B by p. Stern- 
berg allows for the possibility of A and B being stochastically 
interdependent, but it seems that in this case he reduces the se- 
lectivity of the influence of a, p upon A, B to a condition that is 
weaker than even marginal selectivity; the condition is that the 
mean value of A only depends on a and the mean value of B on 
P, while any other parameter of the distributions of A and B, say, 
variance, may very well depend on both a and p. 

Townsend (1984), basing his analysis on Townsend and 
Ashby (1983, Chapter 12), was the first to investigate the no- 
tion of selective influences without assuming that the processes 
which may be selectively influenced by factors are organized se- 
rially. He proposed to formalize the notion of selectively influ- 
enced and stochastically interdependent random variables by the 



concept of "indirect nonselectiveness": the conditional distribu- 
tion of the variable A given any value b of the variable B, depends 
on a only, and, by symmetry, the conditional distribution of B at 
any A^ a depends on P only. Under the name of "conditionally 
selective influence" this notion was mathematically character- 
ized and generalized in Dzhafarov (1999). Although interesting 
in its own right, this notion turns out to be inadequate, however, 
for capturing even the most obvious desiderata for the notion 
of selective influences. In particular, indirect nonselectiveness 
does not imply marginal selectivity, in fact is not even compat- 
ible with it in nontrivial cases. Consider Thurstone's general 
model again. If both the indirect nonselectiveness and marginal 
selectivity are satisfied, then 



E[A\B^b]^fiAix)- 



<^AB ix,y) 
Obb (y) 



Var[A|B = , 



E[B\A : 



1- 



■'AB 



{x,y) 



oaa {x) obb (y) 



OaA (x) = OAAlh (x) , 



= iffi (y) 



'yAB{x,y) 
Oaa (x) 



{a-iiA{x))^^B\u{y): 



Var[B|A = fl] = 1 



■"AB 



{x,y) 



Oaa {x) Obb (>') 



Obb (y) = Obbw, (y) ■ 



It is not difficult to show that these equations can be satisfied 
if and only if either 

(i) Gab {x,y) = 0, in which case the notions of indirect 
nonselectiveness and of marginal selectivity simply 
coincide; or 

(ii) the joint distribution of (A,B) does not depend on 
either a or P (i.e., ilia, I^b, Oaa, Obb, and Oab are all 
constants). 

Neither of these cases, of course, calls for indirect nonselective- 
ness as a separate notion. 

The difficulty of developing a rigorous and useful definition 
of selective influences has nothing to do with the fact that in the 
above examples the random outputs in the diagrams of selective 
influences are unobservable. They may very well be entirely ob- 
servable, at least on a sample level. An example would be two 
performance tests, with outcomes A and B, conducted on a group 
of people divided into four subgroups according as they were 
trained or not trained for the A-test and for the B-test. It may be 
reasonable to hypothesize (at least for some pairs of tests) that 
the random test score A is selectively influenced by the factor a 
with the values 'not trained for the A-test' and 'trained for the 
A-test', while the random test score B is selectively influenced 
by the factor p with the values 'not trained for the B-test' and 
'trained for the B-test'. It is highly likely, however, that the val- 
ues of A and B will be stochastically interdependent within each 
of the four subgroups. 

A definition of selective influences we adopt in this paper was 
proposed in Dzhafarov (2003a), and further developed in Dzha- 
farov and Gluhovsky (2006), Kujala and Dzhafarov (2008), and 
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Dzhafarov and Kujala (2010). Its rigorous formulation is given 
in Section |2] but the gist of it, when applied to a diagram like 
(|2]i, is as follows: there is a random entity R whose distribution 
does not depend on either of the factors a, P, such that A can be 
presented as a transformation of R determined by the value x of 
a, and B can be presented as a transformation of R determined 
by the value y of P, so that for every allowable pair x,y, the joint 
distribution of A, B at these x,y is the same as the joint distribu- 
tions of the two corresponding transformations of R. In the case 
of the diagram ([T]), the transformations are 

/i {R,x,y,u) J2 {R,y) ,/? {R,x,z,u) , 

where x,y,z, u are values of a, p, y, 5, respectively. 

With some additional assumptions this definition has been 
applied to Thurstonian-type modeling for same-different com- 
parisons (Dzhafarov, 2003b-c; Kujala & Dzhafarov, 2009), as 
well as to the hypothetical networks of processes underlying re- 
sponse times (Dzhafarov, Schweickert, Sung, 2004; Schweick- 
ert. Fisher, & Goldstein, 2010). Unexplicated, intuitive uses of 
this notion's special versions can even be found in much ear- 
lier publications, such as Bloxom (1972), Schweickert (1982), 
and Dzhafarov (1992, 1997). In the latter two publications, 
for instance, response time is considered the sum of a signal- 
dependent and a signal-independent components, whose dura- 
tions may very well be stochastically interdependent (even per- 
fectly positively correlated). 

Any combination of regression-analytic and factor-analytic 
models can be viewed as a special version of our definition of 
selective influences. When applied to the diagram ([l]i, such a 
model would have the form 

fi {R,x,y,u)^hi{C,x,y,u)+gi{x,y,u)Su 

f2{R,y)^h2{C,y)+82{y)S2, 

f3{R,y,z,u) ^ h3{C,y,z,u) + g3{y,z,u)S3, 

where C is a vector of random variables ("common sources 
of variation"), Si,S2,Sj, are "specific sources of variation," 
all sources of variation being stochastically independent. To 
recognize in this model our definition one should put R = 
{C,Si,S2,S$). With some distributional assumptions, this 
model, for every possible quadruple {x,y,z,u), has the structure 
of the nonlinear factor analysis (McDonald, 1967, 1982); the 
more familiar linear structure is obtained by making Iii,h2,h3 
linear in the components of C.^ 

More details on the early history of the notion of selective in- 
fluences can be found in Dzhafarov (2003a). The relation of this 
notion to that of "probabilistic explanation" in the sense of Sup- 
pes and Zanotti (1982) and to that of "probabilistic dimensional- 
ity" in psychometrics (Levine, 2003) are discussed in Dzhafarov 



^ To avoid confusion, our use of the term "factor" is reserved for observable 
external inputs (corresponding to the use of the term in MANOVA); the un- 
observable "factors" of the factor analysis can be referred to in the present 
context as "sources of variation," or "sources of randomness." 



and Gluhovsky (2006). The probabilistic foundations of the is- 
sues involved are elaborated in Dzhafarov and Gluhovsky (2006) 
and, especially, Dzhafarov and Kujala (2010). 

Plan of the paper 

In this paper we are primarily concerned with necessary (and, 
under additional constraints, necessary and sufficient) conditions 
for diagrams of selective influences, like ([T]) or (|2]i. We call these 
conditions "tests," in the same way in mathematics we speak of 
the tests for convergence or for divisibility. That is, the meaning 
of the term is non-statistical. We assume that random outputs 
are known on the population level. The principles of construct- 
ing statistical tests based on our population level tests are dis- 
cussed in Section [3.4.2[ but specific statistical issues are outside 
the scope of this paper 

Unlike in Dzhafarov and Kujala (2010), we do not pursue the 
goal of maximal generality of formulations, focusing instead on 
the conceptual set-up that would apply to commonly encoun- 
tered experimental designs. This means a finite number of fac- 
tors, each having a finite number of values, with some (not nec- 
essarily all) combinations of the values of the factors serving as 
allowable treatments. It also means that the random outcomes 
influenced by these factors are random variables: their values 
are vectors of real numbers or elements of countable sets, rather 
than more complex structures, such as functions or sets. To keep 
the paper self-contained, however, we have added an appendix 
in which we formulate the main definitions and statements of the 
theory on a much higher level of generality: for arbitrary sets of 
factors, arbitrary sets of factors values, and arbitrarily complex 
random outcomes. 

In Section |2] we introduce the notion of several random vari- 
ables influenced by several factors and formulate a definition of 
selective influences. In Section[3]we present the Joint Distribu- 
tion Criterion, a necessary and sufficient condition for selective 
influences (or, if one prefers, an alternative definition thereof), 
and we list three basic properties of selective influences. In the 
same section we formulate the principle by which one can con- 
struct tests for selective influences, on population and sample 
levels. In Section|4]we describe the main and universally appli- 
cable test for selective influences. Linear Feasibility Test. The 
test is universally applicable because every random outcome and 
every set of factors can be discretized into a finite number of cat- 
egories. The Linear Feasibility Test is both necessary and suffi- 
cient condition for selective influences within the framework of 
the chosen discretization of inputs and outputs. In Section|5]we 
study tests based on "pseudo-quasi-metrics" defined on spaces 
of jointly distributed random variables, and we introduce many 
examples of such tests. Finally, in Section |6] we discuss, with 
less elaboration, two examples of non-distance-type tests. 

2. BASIC NOTIONS 

2.1. Factors, factor points, treatments 

A factor a, formally, is a set of factor points, each of which 
has the format "value (or level) x of factor a." In symbols, this 
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can be presented as {x, 'a'), where 'a' is the unique name of 
the set a rather than the set itself. It is convenient to write x" 
in place of (x, 'a'). Thus, if a factor with the name 'intensity' 
has three levels, 'low, ' 'medium, ' and 'high, ' then this factor is 
taken to be the set 

intensity = {low'"''"'''^' ,medium'"''""'y ,high'"''"^^^^^ . 

There is no circularity here, for, say, the factor point 
low"'"'"'"'^ stands for {value = low, name ^ 'intensity') rather 
than (value = low, set = intensity). 

In the main text we will deal with finite sets of factors <t> = 
{ai , . . . , ttm}, with each factor a e <t> consisting of a finite num- 
ber of factor points, 

a=K,...,v£}. 

Clearly, a n P = for any distinct a, p G <I>. 

A treatment, as usual, is defined as the set of factor points 
containing one factor point from each factor,"* 



X a,, 



The set of treatments (used in an experiment or considered 
in a theory) is denoted by T C ai x . . . x a,,, and assumed to 
be nonempty. Note that T need not include all possible combi- 
nations of factor points. This is an important consideration in 
view of the "canonical rearrangement" described below. Also, 
incompletely crossed designs occur broadly — in an experiment 
because the entire set ai x . . . x a„, may be too large, or in a the- 
ory because certain combinations of factor points may be physi- 
cally or logically impossible (e.g., contrast and shape cannot be 
completely crossed if zero is one of the values for contrast). 

Example 2.1. In the diagram ([Hi, let a, p,y, and 5 have respec- 
tively 3, 2, 1, and 2 values. Then these factors can be presented 



a = {l«,2«,3«}, 



<I>: 



P={lP,2P}, 
I 8={l^2S} 

The only constraint on one's choice of the labels for the values 
(here, 1,2,3) is that within a factor they should be pairwise dis- 
tinct. Due to the unique superscripting, no two factors can share 
a factor point. The maximum number of possible treatments in 
this example is 12, in which case 

' {l«,lP,l8},{l«,lP,2S},{l«,2P,l8},{l«2P,28}, \ 
T=\ {2«,lP,l5},{2«,lP,2S},{2«,2P,lS},{2",2P,2S} 
^ {3MP,i3},{3«,iP,28},{3«,2P,iS},{3«,2P,28} J 



^ We present treatments as sets {j:"' , . . . } rather than vectors 
(.v°' , . . . ), which would be a connect representation of elements of 
cti X . . . X a„,, because the superscripting we use makes the ordering of the 
points A'"' irrelevant. 



We have deleted V from all treatments because a factor with a 
single factor point can always be removed from a diagram (or 
added to a diagram, if convenient; see 0" notation in Section 
EH). □ 



2.2. Random variables 

A rigorous definition of a random variable (as a special case 
of a random entity) is given in the appendix. For simplicity of 
notation, any random variable A considered in the main text may 
be assumed to be a vector of "more elementary" discrete and 
continuous random variables: for a discrete variable, the set of 
its possible values is countable (finite or infinite), and each value 
possesses a probability mass; in the continuous case, the set of 
possible values is M'^ (vectors with real-valued components), 
and each a e J? possesses a conventional probability density. So 
a random variable A consists of several jointly distributed com- 
ponents, {A\,... ,A<.), some (or all) of which are continuous and 
some (or all) of which are discrete. Note that random vectors in 
this terminology are random variables. The set of possible val- 
ues of A is denoted A and each a ^ H has a mass/density value 
p [a) associated with it.^ 

Every vector of jointly distributed random variables A = 
{A\, . . . ,An) is a random variable, and every value a — 
(fli, . . . ,fl„) e ;?i X ...xA„ of this random variable possesses a 
joint mass/density p (a) = p{ai, . . . ,a„); then for any subvector 
(fl/i , . . . ,fl,j.) of (ai, . . . ,a„) the mass/density pi^...j^{ai^ , . . .,ai^) 
is obtained by summing and/or integrating . . . ,a„) across 
all possible values of («],...,«„) — , . . . Note, however, 
that a vector of random variables A = (A \ , . . . ,A„) need not be 
a random variable, because {A\,... ,A„) need not possess a joint 
distribution. 

We use the relational symbol ~ in the meaning of "is dis- 
tributed as." A ~ Z? is well defined irrespective of whether A and 
B are jointly distributed. 

Let, for each treatment (|) £ T, there be a vector of jointly 
distributed random variables with the set of possible values 
Si = ii\ y. . . . X (that does not depend on (|)) and probabil- 
ity mass/density /?<^ (ai , . . . , a„) that depends on (Sf.^ Then we say 
that we have a vector of jointly distributed random variables that 
depends on treatment (|), and write 

A(^) = (Ai,...,A„)(^), ^er. 

A correct way of thinking of A((|)) is that it represents a set of 
vectors of jointly distributed random variables, each of these 
vectors being labeled (indexed) by a particular treatment. Any 



^ Probability mass/density is generally the Radon-Nikodym derivative with re- 
spect to the product of a counting measure and the Lebesgue measure on R'^ . 

* The invariance of A with respect to (|) (more generally, the invariance of the 
observation space for A with respect to (|)) is convenient to assume, but it is 
not essential for the theory. Its two justifications ai'e that (a) this requirement 
makes it natural to speak of "one and the same" A whose distribution changes 
with if rather than to speak (more correctly) of different random variables 
A ((|)) for different (|); and (b) in the context of selective influences one can 
always redefine the observation spaces for different treatments if to make them 
coincide (see Remark |A.6| in the appendix). 
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subvector of A ((|)) should also be written with the argument (|), 
say, (Ai,A2, A3) ((])). If (|) is explicated as (|) = |x' 



or, say, (|) = {3", 1^,1^}, we will write A{xf x] 
(A,B,C) (3", iP, 1^) instead of more correct A({;i:"', 
or(A,B,C)({3«,lP,lS}). 

It is important to note that for distinct treatments (|)i and (|)2 
the corresponding A((|)i) and A((|)2) do not possess a joint dis- 
tribution, they are stochastically unrelated. This is easy to un- 
derstand: since (|)i and (|)2 are mutually exclusive conditions for 
observing values of A, there is no non-arbitrary way of choos- 
ing which value a = (a 1 , . . . , a„) observed at isf \ should be paired 
with which value a' — (a'j, . . . observed at ^2- To consider 
A((|)i) and A((|)2) stochastically independent and to pair every 
possible value of A((|)i) with every possible value A((|)2) is as 
arbitrary as, say, to consider them positively correlated and to 
pair every quantile of A((|)i) with the corresponding quantile of 

Example 2.2. In diagram ([Hi, let <I> and T be as in Example 
im and let A,B,C be binary, 0/1, variables. Then (A,B,C)((j)) 
is defined, for each (|) = by a table of the following 

form: 



• • T^m' } 

""') or 



a p 5 


ABC 


Pr 


X y z 





pom 




1 


pool 




1 


POlO 




1 1 


poll 




1 


Pm 




1 1 


PW\ 




1 1 


P\w 




1 1 1 


p\n 



Subtreatments across all (|) e T can be viewed as admissible 
values of the subset of factors <!>,■(/= 1, .. . ,n). Note that isf,^. is 
empty whenever <I>; is empty. 

Example 2.3. In the diagram [T] having enumerated A, B,C by 
1,2,3, respectively, Oi = {a,p,5}, 4>2 = {P}, *3 = {a,y,8}. If 
the factor points are as in Examples 12. 1 1 and 12.21 then, choosing 
= {3",lP,n,2S}, we have = {3«,lP,2S}, (^ct,^ = {l^}, 
and (|)(i)3 = {3",n,2^} (where y and its only point \^ can be 
omitted everywhere, making, in particular, the treatments 
and (|) coincide). □ 

The definition below is a special case of the definition of se- 
lective influences given in the appendix. This definition will be 
easier to justify in terms of the Joint Distribution Criterion for- 
mulated in the next section. 

Definition 2.4 {Selective influences). A vector of random vari- 
ables A ((])) = (Ai , . . . ,A„) ((])) is said to satisfy a diagram of se- 
lective influences (|6]l if there is a random variable^ R taking val- 
ues on some set 2^ , and functions fi:<i>iXR^j4i{i= 1 , . . . , n), 
such that, for any treatment (|) £ T, 

(Ai,...,A„)(^) ^(/i (^4,,, /?),..., /„(^4.„,/?)). (7) 
We write then, schematically, (A 1 , . . . , A„ ) (<I>i , . . . , 4>„ ) . 

The qualifier "schematically" in reference to (Ai, . . . ,A„) ^ 
(<I>1,...,4>„) is due to the fact that (Ai,...,A„) is not well- 
defined without mentioning a treatment (|) at which these vari- 
ables are taken. This notation, therefore, is merely a compact 
way of referring to the diagram 

Example 2.5. Consider the Thurstonian "mixture" model de- 
scribed in the introduction: 



state 



l-P, 



separately for each of the 12 treatments. □ 



2.3. Selective influences 

Given a set of factors <I> = {ai , . . . , a,,,} and a vector A((|)) = 
(Ai, . . . ,A„)((|)) of random variables depending on treatment, a 
diagram of selective influences is a mapping 



M: {!,...,«} ^2" 



(6) 



inattentive 



Ha = Q,Oaa = 1 
flB = O.Obb = 1 
Oab = 



attentive 




The selectivity (A,B) ^ (a, P) here is shown by 



(2 being the set of subsets of <t>), with the interpretation that 
<Pi^M{i) 

is the subset of factors (which may be empty) selectively influ- 
encing Ai (i = !,...,«). The definition of selective influences is 
yet to be given (Definition 12. 4b . but for the moment think sim- 
ply of arrows drawn from factors to random variables (or vice 
versa). The subset of factors <I>, influencing A; determines, for 
any treatment (|) £ T, the subtreatments (^^. defined as 

(|)<t,; = {x" € (|) : a e 4>j} , 



' Even though A ((|)) is a random variable, and <I> is a finite set of factors contain- 
ing a finite set of factor points each, the requirement in the definition that R be 
a random variable is unnecessarily restrictive: it is sufficient to require the ex- 
istence of a random entity R distributed on some probability space (x^,'Ljj^,ij) 
(see the appendix). It is shown in the appendix, however, based on the Joint 
Distribution Criterion, that if the definition is satisfied with an arbitrary R, then 
the latter can always be chosen to be a random variable — discrete, continu- 
ous, or mixed according as the variable A ((])) is discrete, continuous, or mixed. 
(Recall that in our terminology every vector of random variables is a random 
variable.) Moreover, R can always be chosen to be distributed unit-uniformly, 
or according to any distribution function strictly increasing on any interval of 
reals constituting . 
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1. putting R = {S,Ni,N2), where 5 is a Bernoulli (0/1) vari- 
able with Pr [5 = 1] = p, Ni ,N2 are standard normal vari- 
ables, and the three variables are independent; 



2. defining 



(/i (x«, {S,NuN2)),f2 (yP, {S,Ni ,N2))) 
= {iJAix'')S+NuiJB{y^)S+N2); 



3. and observing that 



(^PAix'')S + NuPB (/) S + N2) ^ {A,B) (^«,/ 
for all treatments . 



□ 



Remark 2.6. Note that the components of 
(/i (c|)<i>, . . . ,/„((|)(i)„ ,/?)) are jointly distributed for any 
given (|) because they are functions of one and the same random 
variable. The components of (Ai, . . . ,A„)((|)) are jointly dis- 
tributed for any given (|) by definition. There is, however, no joint 
distribution of these two vectors, {f\{iSf^^,R), . . . ,fn{^^„,R)) 
and (Ai , . . . ,A„)((|)), for any (|); and, as emphasized earlier, no 
joint distribution for (Ai , . . . ,A„)((|)i) and (Ai , . . . ,A„)((|)2), for 
distinct isfi and isf2- 



3. JOINT DISTRIBUTION CRITERION 
3.1. Canonical Rearrangement 

The simplest diagram of selective influences is bijective. 




(8) 



In this case we write (Ai,...,A„) ^ (ai,...,a„) instead of 
(Ai,...,A„) H^(4>i ={ai},...,<I>„ = {a„}). 

We can simplify the subsequent discussion without sacrific- 
ing generality by agreeing to reduce each diagram of selective 
influences to a bijective form, by appropriately redefining fac- 
tors and treatments. It is almost obvious how this should be 
done. Given the subsets of factors <I>i . . . ,<!>„ determined by a 
diagram of selective influences each <&,■ can be viewed as a 
factor identified with the set of factor points 



in accordance with the notation we have adopted for factor 
points: ((j)<t>, )"'' — ((|)<i),, 'oc*'). If "t",- is empty, then is empty 
too, and we should designate a certain value, say 0"' , as a 
dummy factor point (the only element of factor a*). The set 
of treatments T for the original factors {ai , . . . , a,,,} should then 
be redefined for the vector of new factors (ttj , . . . , a*) as 



X a„ 



We call this redefinition of factor points, factors, and treatments 
the canonical rearrangement. 



Example 3.1. Diagram ([T]), with the factors defined as in Exam- 
ple l2. 1 K with y omitted), is reduced to a bijective form as follows: 



a" 



P^ = {{/}P*:/ep}, 

r3 = {K>4}'*:K>4}eax5}, 
with, respectively, 12, 2, and 6 factor points, and 

{xf4,xlY , {y^f , {z^4f I e at X P5 X ^ 



the number of treatments, obviously remaining the same, 12, as 
for the original factors. □ 

The purpose of canonical rearrangement is to achieve a bi- 
jective correspondence between factors and the random vari- 
ables selectively influenced by these factors. Equivalently, we 
may say that the random variables following canonical rear- 
rangement can be indexed by the factors (assumed to be) se- 
lectively influencing them. Thus, if we test the hypothesis that 
(Ai,...,A„) ^ (ai,...,a„), we can, when convenient, write 
A{o, J in place of Ai, Aja,} in place of A2, etc. 



3.2. The criterion 

From now on let us assume that we deal with bijective dia- 
grams of selective influences, The notation (^ip. ~ ^{aj} then 
indicates the singleton set {x"'} C (|). As usual, we write x"' in 
place of {jc"'}: 

^i„,j = K',...,.«"}^„_j=xr'. 

The definition of selective influences (Definition 12.41 ) then ac- 
quires the following form: 

Deflnition 3.2 (Selective influences, bijective fonn). A vector of 
random variables A((|)) = (Ai, . . . ,A„)((|)) is said to satisfy a di- 
agram of selective influences and we write (Ai , . . . ,A„) ^ 
(ai , . . . , a„), if, for some random variable^ R and for any treat- 
ment (|) e r, 

(A 1 , ... ,A„) (^) ~ (/i },/?),..., /„ , R)) , (9) 

where /)■ : a,- x 2^ j?,- (/ = 1 , . . . , n) are some functions, with 'J{_ 
denoting the set of possible values of R. 

This definition is difficult to put to work, as it refers to an 
existence of a random variable R without showing how one can 



' See footnote|2] 
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find it or prove that it cannot be found. In Dzhafarov and Ku- 
jala (2010), however, we have formulated a necessary and suf- 
ficient condition for (Ai , . . . ,A„) ^ (ai , . . . , a„) which circum- 
vents this problem. 

Criterion 3.3 (Joint Distribution Criterion, JDC). A vector of 
random variables A{(^) = (Ai, . . . ,A„)((|)) satisfies a diagram of 
selective influences dS)) if and only if there is a vector of jointly 
distributed random variables 



for a,j 




one random variable for each factor point of each factor, such 
that 



[^'l'{ai}'---''^'l'{a«l 



'AW 



(10) 



for every treatment (|) G T. 



Due to its central role, the simple proof of this criterion (for 
the general case of arbitrary factors and sets of random entities) 
is reproduced in the appendix. The vector H in the formulation 
of the JDC is referred to as the JDC-vector for A{^), or the hy- 
potheticalJDC-vector for A{i^), if the existence of such a vector 
of jointly distributed variables is in question. 

Example 3.4. For the diagram of selective influences 




with a = { 1 « , 2" } , p = { 1 P , 2P , 3 P } , Y = { 1 ^, 2T, 3^, 4T^} , and the 
set of allowable treatments 



{l«,2P,n},{l«,2P,3T^},{2«,lP,4T^}, ^ 

{1«,3P,1T},{2«,3P,2T} 
the hypothetical JDC-vector is 

[Hia,H2a,Hyf,,H2f,,H^f,,Hiy,H2y,H3j,H4y) , 
the hypothesis being that 



{Hia,H,j,Hiy)^ 


^(A,B,C)(1« 


2P 




(//ia,//2P,//3y)^ 


^(A,B,C)(1« 


2P 






-(A,B,C) (2« 


IP 






^(A,B,C)(1« 


3P 




(//20',%,H2t)^ 


^(A,B,C) (2« 


3P 





This means, in particular, that Hia and have the same set 
of values as A (which, by our convention, does not depend on 
treatment), the set of values for Hjp, //jp, and //^p is the same as 
that of B, and the set of values for Hij, Hxi, Hyi, and H^y is the 
same as that of C. □ 

The JDC prompts a simple justification for our definition of 
selective influences. Let (A,Z?,C) (a, P,y), as in the previous 
example, with each factors containing two factor points. Con- 
sider all treatments (|) in which the factor point of a is fixed, say, 
at 1". If (A,B,C) ^ (a, P,y), then in the vectors of random vari- 
ables 

(A,B,C) (l«,2P, 1^) ,(A,B,C) (l«,2P,3^) ,(A,B,C) (l«,3P, 1^^) , 
the marginal distribution of the variable A is one and the same, 
Afl",2P,lT^) ^Af 1«,2P,3T^) -Afl«,3P,l 



But the intuition of selective influences requires more: that we 
can denote this variable A (1") because it preserves its identity 
(and not just its distribution) no matter what other variables it 
is paired with, (B,C) (2P,1T), (B,C) {2^,3^), or (B,C) (3P,lT^). 
Analogous statements hold for A (2"), B(2P), B(3P), CiV). 
The JDC formalizes the intuitive notion of variables "preserv- 
ing their identity" when entering in various combinations with 
each other: there are jointly distributed random variables 

Hia , //2a , //j p , //jp , //3P , //lY , //2Y , //3Y , //4Y 

whose identity is defined by this joint distribution; when H\a 
is combined with random variables //jP ^"d Hyt, it forms the 
triad {Hia,H2^,Hiy) whose distribution is the same as that 
of (A,B,C)(1",2P,1T); when the same random variable Hia 
is combined with random variables //jp and H^y, the triad 
(//ia,//2p,i/3Y) is distributed as (A,B,C) (l",2P,3^); and so on 
— the key concept being that it is one and the same Hia which 
is being paired with other variables, as opposed to different ran- 
dom variables A (l",2P,n) ,A (l",2P,3^) A (l",3P, which 
are identically distributed (cf. Example 13 .71 below, which shows 
that the identity is not generally preserved if all we know is 
marginal selectivity). 



3.3. Three basic properties of selective influences 

The three properties in question are immediate consequences 
of JDC. 



3.3.1. Property 1: Nestedness. 

For any subset {/i,...,^} of {!,...,«}, if (Ai,...,A„) 
(ai , . . . , a„) then (A,-, , . . . ,A;J ^ (a;; , . . . , aij. 

Example 3.5. In Example fjA\ if (A,B,C) ^ (a,p,Y), then 
(A,C) <-P (a,Y), because the JDC criterion for (A,B,C) *-P 
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(a,p,Y) implies that {Hia,H2a,Hiy,H2y,H^y,H4y) are jointly dis- 
tributed, and that 



(//ia,//iy)~(A,C)(l«,n), 

(//i,//3y)^(A,C)(1«,3T^), 
(//2«,//2t)~(A,C)(2«,2T^), 

(//2a,H4Y)~(A,C)(2«,4T^). 

Analogously, {A,B) ^ (a,P) and (B,C) ^ (P,y). Statements 
with ^ involving a single variable merely indicate the de- 
pendence of its distribution on the corresponding factor: thus, 
A ^ a simply mean that the distribution of A (x"',y^,z^) does 
not depend on y P , z^. □ 



3.3.2. Property 2: Complete Marginal Selectivity 

For any subset {ii,...,^} of {!,...,«}, if (Ai,...,A„) 
(ai , . . . , a„ ) then the A;-marginal distribution^ of (A,j , . . . , A,^. ) ((])) 
does not depend on points of the factors outside (a,, , . . . ,a,j). 
In particular, the distribution of A, only depends on points of a,. 

This is, of course, a trivial consequence of the nestedness 
property, but its importance lies in that it provides the easiest 
to check necessary condition for selective influences. 



a p y 


ABC 


Pr 




a p y 


ABC 


Pr 


1 3 1 





.4 




2 3 2 





.2 




1 


.1 




1 


.1 




1 







1 


.2 




1 1 







1 1 


.1 




1 







1 


.3 




1 1 


.2 




1 1 


.1 




1 1 


.1 




1 1 







1 1 1 


.2 




1 1 1 






One can check that marginal selectivity holds for all 1- 
marginals: thus, irrespective of other factor points. 









a 


A 


Pr 




a 


A 


Pr 
















ii 





.5 




li 





.6 


















1 


.5 






1 


.4 












P 


B 


Pr 




P 


B 


Pr 




P 


B 


Pr 








\L 





.6 




2 





.5 




3 





.7 










1 


.4 






1 


.5 






1 


.3 






Y 


A 


Pr 




Y 


A 


Pr 




Y 


A 


Pr 




Y 


A 


Pr 


ii 





.5 




ll 





.7 




li 





.4 




4 





.9 




1 


.5 






1 


.3 






1 


.6 






1 


.1 



One can also check that irrespective of the factor point of y, the 
2-marginal (A,B) only depends on a and p: 



Example 3.6. Let the factors, factor points, and the set of treat- 
ments be as in Example [34l Let the distributions of (A,B,C) at 
the five different treatments be as shown; 



a p y 


ABC 


Pr 


1 2 1 





.2 




1 


.1 




1 


.1 




1 1 


.1 




1 


.1 




1 1 


.1 




1 1 


.1 




1 1 1 


.2 



a p y 


ABC 


Pr 


1 2 3 










1 


.3 




1 


.2 




1 1 







1 


.1 




1 1 


.1 




1 1 


.1 




1 1 1 


.2 



a p y 


ABC 


Pr 


2 1 4 





.3 




1 







1 


.3 




1 1 







1 


.3 




1 1 







1 1 







1 1 1 


.1 



a p 


A B 


Pr 


1 2 





.3 




1 


.2 




1 


.2 




1 1 


.3 



a p 


A B 


Pr 


2 1 





.3 




1 


.3 




1 


.3 




1 1 


.1 



a p 


A B 


Pr 


1 3 





.5 




1 







1 


.2 




1 1 


.3 



a p 


A B 


Pr 


2 3 





.3 




1 


.3 




1 


.4 




1 1 






Marginal selectivity, however, is violated for the 2-marginal 
(A,C): if the factor point of P is 2^, 



a y 


A C 


Pr 


1 1 





.3 




1 


.2 




1 


.2 




1 1 


.3 



but at 3P, 



<:-margmal distribution is tlie distribution of a subset of k random variables 
(k > 1) in a set of n>k variables. In Townsend and Schweickert (1989) the 
property was formulated for 1-marginals of a pair of random variables. The 
adjective "complete" we use with "marginal selectivity" is to emphasize that 
we deal with all possible marginals rather than with just 1-marginals. 



a y 


A C 


Pr 


1 1 





.4 




1 


.1 




1 


.1 




1 1 


.4 
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This means that the diagram of selective influences {A,B,C) <-P 
(a, p,y) is ruled out. □ 

As pointed out in Section [1] the marginal selectivity property 
alone is too weak to define selective influences. The example be- 
low demonstrates that the property of marginal selectivity does 
not allow one to treat each of the random variables as preserving 
its identity in different combinations of "its" factor with other 
factors. 

Example 3.7. Let a = {1",2"}, p = {1^,2?}, and the set of 
allowable treatments T consist of all four possible combinations 
of the factor points. Let A and B be be Bernoulli variables dis- 
tributed as shown: 



where we denote by F (x", •) the application of F to the compo- 
nent labeled by x". Clearly, F [H) possesses a joint distribution 
and contains one component for each factor point. If we now de- 
fine a vector of random variables B ((|)) for every treatment (|) G T 
as 

(Bi, . . . ,B„) (^) = {F ((^{a,},Ai) ,...,F (c^{a„},A„)) (C^) , 
then 

(Bi , . . . , B„ ) (^) ~ (F } , A 1 ) , . . . , F (c^{a„} , A„) ) (c^) , 



a p 


A B 


Pr 


1 1 





.1 




1 







1 







1 1 


.9 


a p 


A B 


Pr 


2 1 










1 


.9 




1 


.1 




1 1 






a p 


A B 


Pr 


1 2 





.09 




1 


.01 




1 


.81 




1 1 


.09 


a p 


A B 


Pr 


2 2 










1 


.9 




1 


.1 




1 1 






Marginal selectivity is satisfied: Pr [A (1", •) = 0] = 0.1 and 
Pr [A (2", •) = 0] = 0.9 irrespective of whether the placeholder is 
replaced with 1^ or 2^; and analogously for B. If we assume, 
however, that this allows us to write A(l"), A (2"), B{\^), 
B(2^) instead of A(1",iP), A(1",2P), etc., we will run into 
a contradiction. From the tables for (|) = 1^}, {2",!^}, 
and {2",2P}, we can successively conclude A(l") = Z?(lP), 
A (2") = 1 - B (1 P) , and A (2") = 1 - B (2^) . But then A ( 1 ") = 
B(2P), which contradicts flie table for (|) = {l",2P}, where 
A (1") and B (2^) are stochastically independent and nonsingu- 
lar. This contradiction proves that the diagram of selective in- 
fluences (A,B) ^ (a, p) cannot be inferred from the compliance 
with marginal selectivity. □ 



3.3.3. Invariance under factor-point-specific transformations 
Let (Ai,...,A„) H= (ai,...,a„) and 

H = (^H^at , . . . ,//^a, , . . . ,H^a„ , . . . ,//^.a„ 

be the JDC-vector for (Ai, . . . ,A„ )((])). Let F (H) be any function 
that applies to H componentwise and produces a corresponding 
vector of random variables 



F{H) = 



and it follows from JDC that (Bi, . . . ,B„) ^ (ai , . . . ,a„)."' A 
function F(x"' , •) can be referred to as a factor-point-specific 
transformation of the random variable A,, because the random 
variable is generally transformed differently for different points 
of the factor assumed to selectively influence it. We can formu- 
late the property in question by saying that a diagram of selective 
influences is invariant under all factor-point-specific transforma- 
tions of the random variables. Note that this includes as a special 
case transformations which are not factor-point-specific, with 



Fx" 



:F X 



F(a,v). 



Example 3.8. Let the set-up be the same as in Example l3.7l ex- 
cept for the distributions of (A,B) at the four treatments: we now 
assume that these distributions are such that (A,B) ^ (oc, p). 
The tables below show all factor-point-specific transformations 
A ^ A* and B ^ B* at the four treatments, provided that the 
sets of possible values of A* and B* are respectively, •} and 
{[>,o}, and that at the treatment {l", 1^} the value of A is 
mapped into * and the value of B is mapped into >. 



a 


P 


A^A* 


B^B* 


I 


1 


0-^* 
1 


0^i> 
1 


1 


2 


0^* 
1 ^» 


O^o 

1 


2 


1 


0^» 

1 


0^i> 
1 


2 


2 


0^» 

1 ^* 


O^o 

1 



a 


P 


A^A* 


B^B* 


1 


1 


0^* 
1 ^» 


O^o 
1 


1 


2 


0^* 
I ^« 


0^> 
1 -i-o 


2 


1 


0->» 

1 


0^i> 
1 


2 


2 


0^» 

1 


O^o 
1 -S^o 



Since it is possible that F {x°-,H^a) and F (y°,//,,o), with x" ^y", have dif- 
ferent sets of possible values, strictly speaking, one may need to redefine the 
functions to ensure that the sets of possible values for B ((|)) is the same for 
different (|). This is, however, not essential (see footnote|6). 
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a 


P 




B^B* 


1 


1 


1 ^» 


O^o 
1 


1 


2 


0^* 
1 ^« 


0^ o 


2 


1 


0^* 

1 ^» 


O^o 

1 


2 


2 


0^* 

1 ^« 


1^0 



a 


P 




B^B* 


1 


1 


0^* 
1 ->• 


O^o 
1 


1 


2 


0^* 

1 ^» 


1 


2 


1 


0^* 

1 ^» 


1 


2 


2 


0^* 

1 ^. 


O^o 

1 



The possible transformations are restricted to these four because 
we adhere to our convention that A has the same set of values at 
all treatments, and the same is true for B. This convention, how- 
ever, is not essential, and nothing else in the theory prevents one 
from thinking of A at different treatments as arbitrarily different 
random variables. With this "relaxed" approach, the following 
table gives an example of a factor-point-specific transformation: 



a 


P 


A^A* 


B^B* 


1 


1 


0^0 
1 ^ 1 


0^0 
1 ^ 1 


1 


2 


0^0 

1 ^ 1 


0^-2 
1^3 


2 


1 


0^ 10 

1 ^ -20 


0^0 
1 ^ 1 


2 


2 


0^ 10 

1 ^ -20 


0^-2 
1^3 



If this is considered undesirable, the variables {A*,B*) can 
be redefined to have {-20, 0, 1 , 10} and {-2, 0, 1 , 3} and the re- 
spective sets of their possible values, assigning zero probabilities 
to the values that cannot be attained at a given factor point. □ 

This property is of critical importance for construction and 
use of tests for selective influences, as defined in the next sec- 
tion. A test, generally, lacks the invariance property just formu- 
lated: e.g., if the transformation consists in grouping of the orig- 
inal values of random variables, different groupings may result 
in different outcomes of certain tests, fail or pass. Such a test 
then can be profitably applied to various factor-point-specific 
transformations of an original set of random variables, creating 
thereby in place of a single test a multitude of tests with poten- 
tially different outcomes (a single negative outcome ruling out 
the hypothesis of selective influences). 



3.4. 

ences 



General principles for constructing tests for selective influ- 



statement S relating to each other (Ai , . . . ,A„)((|)) for differ- 
ent treatments E T which (a) holds true if (Ai,...,A„) ^ 
(tti , . . . , a„), and (b) does not always hold true if this hypothesis 
is false. A test for a diagram of selective influences therefore is 
a necessary condition: if the variables {(Ai , . . . ,A„)((|)) : gT} 
fail it (i.e., if 6 is false for this set of random variables), we know 
that the hypothesis (Ai , . . . ,A„) ^ (ai , . . . , a„) is false. If the 
statement © is always false when (Ai , . . . ,A„) «/P (ai , . . . , a„), 
the test becomes a criterion for selective influences. A test or 
criterion can be restricted to special classes of random variables 
(e.g., random variables with finite numbers of values, or multi- 
variate normally distributed at every treatment) and/or factor sets 
(e.g., 2x2 experimental designs). 

The JDC provides a general logic for constructing such 
tests: we ask whether the hypothetical JDC-vector H = 



.,Ha„ 



containing one variable 



for each factor point of each factor, can be assigned a joint dis- 
tribution such that its marginals corresponding to the subsets 
of factor points that form treatments (|) e T are distributed as 
(Ai , . . . ,A„)((|)). Put more succinctly: is there a joint distribution 



of 



Ha, . 



■ ,Ha„, 



with given marginal dis- 



tributions of the vectors 



H, 



foran(j)e 

Thus, in a study of random variables (A, B) in a 2 x 2 factorial 
design, with a {1",2"}, p = {lP,2P}, and T containing all 
four logically possible treatments, we consider a hypothetical 
JDC-vector (i/ia,i/2«,^'^iP,^2P) °f which we know the four 2- 
marginal distributions corresponding to treatments: 

^laiP = (//ia,//jp) ~ 

etc. 



(A,B)(l",lP), 
(A,B)(1«,2P), 



Of course, we also know the lower-level marginals, in this case 
the marginal distributions of Hia, Hi^, //jp, and //jp, but they 
need not be considered separately as they are determined by the 
higher-order marginals. The question one poses within the logic 
of JDC is: can one assign probability densities to different val- 
ues of H — (//ia,//2«,^/^iP,^^7p) so that the computed marginal 
distributions of (//ia,//jp), [H\a^H2f,), etc., coincide with the 
known ones? 

If the vector A = (Ai,...,A„) has a finite number of possi- 
ble values (we may state this without mentioning (|) because, by 



3.4.1. Population level tests 

Given a set of factors {ai , . . . , a,,}, a vector of random vari- 
ables depending on treatments, (Ai, . . . ,A„ )((])), and the hypoth- 
esis (Ai , . . . ,A„) «-P (tti , . . . , a„), a test for this hypothesis is a 



Surprisingly, at least for the authors, a slightly less general version of the same 
problem (the existence of a joint distributions compatible with observable 
marginals) plays a prominent role in quantum mechanics, in dealing with the 
quantum entanglement problem (Fine, 1982a-b). We are grateful to Jerome 
Busemeyer for bringing this fact to our attention. The parallels with quantum 
mechanisms will be discussed in a separate publication. 
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our convention, the set of values does not depend on (|)), then so 



does the vector H ~ 



tin 



and 



H a, H a, . Han , 

the logic of JDC is directly implemented in the Linear Fea- 
sibility Test introduced in the next section. When the set of 
values for A is infinite or too large to be handled by the Lin- 
ear Feasibility Test, one may have to use an indirect approach: 
computing from the distribution of each //^ certain functionals'^ 
gi (Hijf) ,.. . ,gi„ [HiSf) and constructing a statement 

e{gi{H^),...,g,„{H^):(^eT) 

relating to each other these functionals for all (|) e T. The state- 
ment should be chosen so that it holds true if H possesses a joint 
distribution, but may be (or, better still, always is) false other- 
wise. 

We illustrate this logic on a simple distance test of the variety 
introduced in Kujala and Dzhafarov (2008). Assuming that all 
random variables in (Ai, . . . ,A„) take their values in the set of 
reals, for each pair of factor points {jc",^^ } define 



Mx"yP = E 



where, for convenience, we write Ma">'P in place of M (x",^^) . 
It can be easily shown that M is a metric on the set H if H pos- 
sesses a joint distribution for its components. For each treatment 
(|), define the functional 

whose value can be computed from the known distributions: 



A^<l){a}(^{|3} =E[|A{„}I 



-A 



{P}' 



(11) 



where A 



{a}[ 



and A 



are the random variables in 



(Ai, . . . ,A„ )((])) which are supposed to be selectively influenced 
by a and p, respectively. Due to the marginal selectivity (which 
we assume to hold because otherwise selective influences have 
already been ruled out), this quantity is the same for all treat- 
ments (|) which contain the same factor points x°',y^ of factors 
a, p. The statement 6 is then as follows: for any (not neces- 
sarily pairwise distinct) treatments (|)' , . . . T and any factors 
a' , . . . , a' G 4> (/ >3) such that 



ai 7^a2 7^ ... 7^a,_i ^ai^ai, 



(12) 



and 



^{ai}=^{ai}'---' ^{a'-i}=^{a'-'}'*{a'}=^{a'}' ^l^) 
we should have 

^a'.a' {h^') < §ai,a2 (%) + ■ ■ ■ +^a'-i,a' (%) ■ (14) 



A functional g {X) is a function mapping each random variable X from some 
set of random variables into, typically, a real or complex number (more gener- 
ally, an element of a certain "standard" set). A typical example of a functional 
is the expected value E [X] . 



The truth of & for H with jointly distributed components follows 
from the triangle inequality for M. The inequality may very well 
be violated when the components of H do not possess a joint 
distribution (i.e., when the hypothesis of selective influences is 
false). 



Example 3.9. To apply this test to Example [321 we make use 
of the property that if (A,B) ^ (a, p) then (A*,B*) ^ (a, p) for 
any factor-point-specific transformations {A*,B*) of (A,B). Let 
us put B* =B and 



A* 



I A ifc^{a} = l", 

[1-A if^{„}=2«. 

This yields the distributions 



a p 


A* 


B* 


Pr 


1 1 








.1 







1 







1 










1 


1 


.9 


a p 


A* 


B* 


Pr 


2 1 


1 










1 


1 


.9 










.1 







1 






It is easy to check that 



a p 


A* B* 


Pr 


1 2 





.09 




1 


.01 




1 


.81 




1 1 


.09 


a p 


A* B* 


Pr 


2 2 


1 







1 1 


.9 







.1 




1 





-B{ 




= 


-B{ 


1«,2P) 


= 


-B{ 


2MP) 


l] = 


-B{ 


2«,2P) 


= 



Since 



0.82=M1"2P >Ml"lP+M2"lP+M2"2P = 0, 

the triangle inequality is violated, rejecting thereby the hy- 
pothesis (A*,Z?*) ^ (a, P), hence also the hypothesis (A,B) ^ 
(a,p). □ 



3.4.2. Sample-level tests 

Although this paper is not concerned with statistical ques- 
tions, it may be useful to outline the general logic of construct- 
ing a sample-level test corresponding to a population-level one. 
Analytic procedures and asymptotic approximations have to be 
different for different tests, but if the population-level test can 
be computed efficiently, the following Monte-Carlo procedure 
is always applicable. 
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1. For each of the random variables A i, . . . ,A„, if it has more 
than a finite number of values (or has too many values, 
even if finite), we discretize it in the conventional way, by 
forming successive adjacent intervals and replacing each 
of them with its midpoint. Continue to denote the dis- 
cretized random variables Ai,... ,A„. 

2. We now have sample proportions 
Pr[(Ai =ai,...,A„ =fl„) (x"',... ,<")], where 
ai,...,a„ are possible values of the corresponding 
random variables A i , . . . , A„ . 

3. For each treatment, we form a confidence region of possi- 
ble probabilities Pr [(Ai =«!,... ,A„ ~ a„) (jc"' . . .x"")] 
for a given set of estimates, at a given level of a family wise 
confidence level for the Cartesian product of these confi- 
dence regions, with an appropriately adopted convention 
on how this familywise confidence is computed (glossing 
over a controversial issue). 

4. The hypothesis of selective influences is retained or re- 
jected according as the combined confidence region con- 
tains or does not contain a point (a set of joint probabili- 
ties) which passes the population test in question. (Grad- 
uaUzed versions of this procedure are possible, when each 
point in the space of population-level probabilities is taken 
with the weight proportional to its likelihood.) 

Instead of a confidence region of multivariate distributions based 
on a discretization, one can also generate confidence regions of 
distributions belonging to a specified class, say, multivariate nor- 
mal ones. 

Resampling techniques is another obvious approach, al- 
though the results will generally depend on one's often arbi- 
trary choice of the resampling procedure. One simple choice 
is the permutation test in which the joint sample proportions 
Pr[Ai = fli , . . . ,A„ = fl„] obtained at different treatments (and 
treated as probabilities) are randomly assigned to the treatments 
(|). If the initial, observed assignment passes a test, while the 
proportion of the permuted assignments which pass the test is 
sufficiently small, the hypothesis of selective influences is con- 
sidered supported. 



4. LINEAR FEASIBILITY TEST 

In this section we assume that each random variable A,((|)) 
in (Ai, . . . ,A„)((|)) has a finite number nij of possible values 
an,. . . ,aim.. It is arguably the most important special case both 
because it is ubiquitous in psychological theories and because 
in all other cases random variables can be discretized into fi- 
nite number of categories. We are interested in establishing 
the truth or falsity of the diagram of selective influences dHJ, 
where each factor a, in (ai,...,a„) contains ki factor points. 
The Linear Feasibility Test to be described is a direct appli- 
cation of JDC to this situation,'^ furnishing both a necessary 



and sufficient condition for the diagram of selective influences 
(Ai,...,A„) ^ (ai,...,a„). 
In the hypothetical JDC-vector 



H = I H , . . . ,// ai , . . . ,H a„ ,H a„ I , 



since we assume that 



//a, ~ Ai 



for any Xj' and any treatment (|) containing Xj', we know 
that the set of possible values for the random variable H a, is 

{an aim-}, irrespective of xj. Denote 



Pr 



(A 1 = fli/, , . . . , A„ = a„i„ ) , . . . 

/ for r.v.s for factor points^ 

V 



(15) 



where l, G {!,..., nii} and A,,- G {!,..., ki} for / = !,...,« ("r.v.s" 
abbreviates "random variables"). Denote 



Pr 



(16) 



foiAi forA,, 



where /,j e { 1 , . . . , m,} for i ~ !,...,«. This gives us mj' x . . . x 
m^" g-probabilities. A required joint distribution for the JDC- 
vector H exists if and only if these probabilities can be found 
subject to x . . . x m„" nonnegativity constraints 



2('ll7---:'lA'i,---,'nl,---,'nA:„) > 0, 



(17) 



and (denoting by iit the number of treatments in T) x mi x 
... X m„ linear equations 



^ g (/i 1 , . . . , , . . . , /„ 1 , . . . , z„i.„ ) 

= P(/i,...,Z„;A,i,...,A,„), 
where the summation is across all possible values of the set 

{'l 1 , • • • 7 ^l/tl , • • • , '«l , ■ ■ ■ , Ink,, } - {hXi lnX„ } : 

while 

hx, =h,-- -JizX,, = hi- 



(18) 



In reference to footnote [TTI this test has been proposed in the context of deal- 
ing with multiple-particle multiple-measurement quantum entanglement situ- 



ations by Werner & Wolf (2001a, b) and Basoalto & Percival (2003). 
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Selective influences hold if and only if the system of these linear 
equalities with the nonnegativity constraints is feasible (i.e., has 
a solution). This is a typical linear programming problem (see, 
e.g., Webster, 1994, Ch. 4).'"^ Many standard statistical and 
mathematical packages can handle this problem. 

Note that the maximal value for nj is nj ~ ki x . . . x k„, 
whence the maximal number of Unear equations is (miki) x 
... X {m„k„). Since mik, < m-' (assuming m/,A:,- > 2), with the 
equality only achieved at fc, = nij = 2, the system of linear equa- 
tions is always underdetermined. In fact, the system of equa- 
tions is underdetermined even if fc, = m, = 2 for all ; = !,...,«, 
because of the obvious linear dependences among the equations. 

Example 4.1. Let a = {1",2"}, p = {1^,2^}, and the set of al- 
lowable treatments T consist of all four possible combinations of 
the factor points. Let A and B be Bernoulli variables distributed 
as shown; 



a p 


A B 


Pr 


1 1 





.140 




1 


.360 




1 


.360 




1 1 


.140 


a p 


A B 


Pr 


2 1 





.189 




1 


.311 




1 


.311 




1 1 


.189 



a p 


A B 


Pr 


1 2 





.198 




1 


.302 




1 


.302 




1 1 


.198 


a p 


A B 


Pr 


2 2 





.460 




1 


.040 




1 


.040 




1 1 


.460 



Marginal selectivity here is satisfied trivially: all marginal prob- 
abilities are equal 0.5, for all treatments. The linear programing 
routine of Mathematica^'^(using the interior point algorithm) 
shows that the linear equations ( fTSl ) have nonnegative solutions 
corresponding to the JDC-vector 





//2« 




^2P 


Pr 














.02708610 











1 


.00239295 








1 





.16689300 








1 


1 


.03358610 





1 








.00197965 





1 





1 


.10854100 





1 


1 





.00204128 





1 


1 


1 


.15748000 



Hia //2a 




//2P 


Pr 


1 








.15748000 


1 





1 


.00204128 


1 


1 





.10854100 


1 


1 


1 


.00197965 


1 1 








.03358610 


1 1 





1 


.16689300 


1 1 


1 





.00239295 


1 1 


1 


1 


.02708610 



This proves that in this case we do have {A,B) ^ (a, P). 



□ 



Example 4.2. In the previous example, let us change the distri- 
butions of (A,Z?) to the following: 



a p 


A B 


Pr 


1 1 





.450 




1 


.050 




1 


.050 




1 1 


.450 


a p 


A B 


Pr 


2 1 





.170 




1 


.330 




1 


.330 




1 1 


.170 



a p 


A B 


Pr 


1 2 





.105 




1 


.395 




1 


.395 




1 1 


.105 


a p 


A B 


Pr 


2 2 





.110 




1 


.390 




1 


.390 




1 1 


.110 



Once again, marginal selectivity is satisfied trivially, as all 
marginal probabilities are 0.5, for all treatments. The linear 
programing routine of Mathematica™, however, shows that the 
linear equations ( fTSl l have no nonnegative solutions. This ex- 
cludes the existence of a JDC-vector for this situations, ruling 
out thereby the possibility of {A,B) ^ (a, P). □ 

Since the Linear Feasibility Test is both a necessary and 
sufficient condition for selective influences, if it is passed for 
(Ai , . . . ,A„)((|)), it is guaranteed to be passed following any 
factor-point-specific transformations of these random outputs. 
All such transformations in the case of discrete random vari- 
ables can be described as combinations of renamings (factor- 
point specific ones) and augmentations (grouping of some values 
together). In fact, a result of the Linear Feasibility Test simply 
does not depend on the values of the random variables involved, 
only their probabilities matter. Therefore a renaming, such as in 
Example 13.81 will not change anything in the system of linear 
equations and inequalities (fT7b-(fT8b. An example of augmenta- 
tion (or "coarsening") will be redefining A and B, each having 
possible values 1,2,3,4, into binary variables 



ifA((^) = l,2, 

1 if A ((j)) =3,4, 



B*{^) 



if B((j)) = 1,2,3, 

1 ifB(^)=4. 



It is clear that any such an augmentation amounts to replacing 
some of the equations in (fTSI l with their sums. Therefore, if the 
original system has a solution, so will also the system after such 
replacements. 

The same reasoning applies to one's redefining the factors by 
grouping together some of the factor points: e.g., redefining a = 
{l«,2«,3"}into 



a 



{{l",2«}"*,{3"}"*} = {l"*,2"*}. 



More precisely, this is a lineal' programming task in the standard form and with 
a dummy objective function (e.g., a linear combination with zero coefficients). 



This change will amount to replacing by their sum any two 
equations whose right hand sides correspond to identical vec- 
tors (Zi ,...,/„; A-i A-,, ) except for the factor point for a being 
1 in one of them and 2 in another. 

Summarizing, the Linear Feasibility Test cannot reject selec- 
tive influences on a coarser level of representation (for random 
variables and/or factors) and uphold it on a finer level (although 
the reverse, obviously, can happen). 

If the random variables involved have more than finite number 
of values and/or the factors consist of more than finite number of 



Selectivity in Probabilisitc Causality 



15 



factor points, or if these numbers, though finite, are too large to 
handle the ensuing linear programming problem, then the Linear 
Feasibility Test can still be used after the values of the random 
variables and/or factors have been appropriately grouped. The 
Linear Feasibility Test then becomes only a necessary condition 
for selective influences, and its results will generally be different 
for different (non-nested) groupings. 

Example 4.3. Consider the hypothesis {A,B) <-P (a, P) with the 
factors having a finite number of factor points each, and A and 
B being response times. To use the Linear Feasibility Test, one 
can transform the random variable A as, say, 

ri ifA(^)<fli/4(^), 

A*(*) = J^ iffli/4(^) <A(^) <fli/2(^), 

I 3 iffli/2W<A(^)<fl3/4(^), 
U ifA((^) >fl3/4((^), 

and transform B as 

B*M^l^ ifBW<VW. 

\2 ifB(^)>foi/2(0), 

where Up ((])) and bp ((])) designate the pth quantiles of, respec- 
tively A ((])) and B ((])). The initial hypothesis now is reformulated 
as (A*,Z?*) ^ (a, p), with the understanding that if it is rejected 
then the initial hypothesis will be rejected too (a necessary con- 
dition only). The Linear Feasibility test will now be applied to 
distributions of the form 



a p 


A B 


Pr 


X y 


1 1 


Pn 




1 2 


pn 




4 1 


P4l 




4 2 


P42 



where the marginals for A are constrained to 0.25 and the 
marginals for B to 0.5, for all treatments jx^j^P}, yielding a 
trivial compliance with marginal selectivity. Note that the test 
may very well uphold {A* ,B*) ^ (oc, p) even if marginal selec- 
tivity is violated for (A,B)((|)) (e.g., if the quantiles Op (x"',y^^ 
change as a function of yP). □ 
Sample level problems do not seem to present a serious diffi- 
culty. The general approach mentioned in Section [3.4.2l is facili- 
tated by the following consideration. If a system of linear equa- 
tions and inequalities has an "interior" solution (one for which 
all inequalities are satisfied in the strict form, which in our case 
means that the solution contains no zeros), then the solution is 
stable with respect to sufficiently small perturbations of its co- 
efficients. In our case, this means that if an interior solution 
exists for population-level values of P (/i , . . . ,/„; A-i , . . . , A-,,), and 
if the sample estimates of the latter are sufficiently close to the 
population values, then the system will also have a solution for 
sample estimates. By the same token, if no solution exists for 
the population-level values of f (Zi ,...,/„; A,i then no 



solution will be found for sufficiently close to them sample es- 
timates. The only unstable situation exists if solutions exists on 
the hypothetical population level (i.e., the selectiveness of influ- 
ences is satisfied), but they are all non-interior (contain zeros). 

Remark 4.4. The question arises: how restrictive is the condi- 
tion of selective influences within the class of distributions sat- 
isfying marginal selectivity? We do not know anything close 
to a complete answer to this question, but simulations show 
that selectivity of influence is not overly restrictive with re- 
spect to marginal selectivity. Thus, if ki = m, = 2 for ; = 1,2, 
and if we constrain all marginal probabilities to 0.5 and pick 
P(I, I; 1,1), P(l,l; 1,2), P(1,1;2,1),P (1,1;2, 2) from four in- 
dependent uniform distributions between and 0.5, the probabil- 
ity of "randomly" obtaining selective influences is about 0.67. If 
ki = m, = 2 for ; = 1,2,3, and we constrain all 2-marginal prob- 
abilities to 0.25, the analogous probability is about 0.10. 

5. DISTANCE-TYPE TESTS 
5.1. General theory 

First, we establish the general terminology related to distance- 
type functions. Given a set a function d : ^ x ^ ^ [Q,°o] 
is a p rente trie if d {x,x) = 0. The inclusion of the possibiUty 
d {x,y) = °o usually adds the qualifier "extended" (in this case, 
extended premetric), but we will omit it for brevity. A premetric 
that satisfies the triangle inequality, 

d {x,z) < d{x,y) +d{y,z) , 

for any x,y,z G ^, is a pseudo-quasi-metric (p. q. -metric, for 
short). A p. q. -metric which is symmetric, 

d{x,y) =d{y,x), 

for all x,y G is a pseudometric. A p. q. -metric such that 

x:^y=^d{x,y)>0 

(equivalently, d{x,y) = if and only if x = y) is a quasimet- 
ric. A p. q. -metric which is simultaneously a quasimetric and a 
pseudometric is a conventional (symmetric) metric. The words 
"metric" and "distance" can be used interchangeably: so one can 
speak of conventional (symmetric) distances, pseudodistances, 
quasidistances, and p. q. -distances. 

We are interested in the situation when 5^ is a set of jointly 
distributed random variables (discreet, continuous, or mixed), 
with the intent to apply a distance-type function definable on 
such an ^ to the JDC-vector H of random variables for the di- 
agram of selective influences (O. The random variables A ((])) = 
(Ai,...,A„)((|)), the factors <I> = {ai,...,a„}, and the set of 



' The terminology adopted in this paper is conventional but not universal. In 
particular, the term "metric" or "distance" is sometimes used to mean pseu- 
dometric. In the context of Finsler geometry and the dissimilarity cumulation 
theory (Dzhafai'ov, 2010) the term "metric" is used to designate quasimetric 
with an additional property of being "symmetric in the small." 
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treatments T are defined as above. The main property we are 
concerned with is the triangle inequality, that is, it is typically 
sufficient to know that the distance-type function we are dealing 
with is a p. q. -metric. 

The function ( fTTI ) considered in Section [3.4. ll serves as an in- 
troductory example of a metric on which one can base a test for 
selective influences. As a simple example of using a p. q. -metric 
which is not a conventional metric (in fact, not even a pseudo- 
metric or quasimetric), consider the following. Let the elements 
of 2(, be binary random variables, with values {1,2}. Define, for 
any Ai,...,Ap,Bi,...,Bg G 



PP) [(Ai,...,Ap)(Bi,...,B, 



Pr 



Ai = 1 for / = !,...,/?, 
By = 2fory = l,...,q 



The parentheses may be dropped around singletons, in particu- 
lar, 



Pr [A = 1 ,B = 2] = (5)] ^ p[z) j^^] 

The latter is clearly a premetric: P'^^ is nonnegative, and 
p(2) [RR] = 0, for any /? e ?^ . To prove the triangle inequality, 

p(2' [R1R2] < [RR2] +P*"' [RiR] , 
for any Ri,R2,R E observe that 

[R,R2]=P'^^^ [{Rx,R)R2] +P^-^ [Ri [RiM , 

[RR^] =p(2) [{R^^R)R^]+p(^) [R{Ri,R2)], 



P(2) [R,R] = p(2) [(/;i,7;2)/?] +P<'' (^2,P)] , 



whence 



PP) [TJTjj] [RiR]~P^'^'> [R1R2] 



= p(2) [7?(7?i,/?2)] +P*2) [{RuR2)R] > 0. 

Note that P'^^ is not a pseudometric because generally 

P(2) [iei/?2] = Pt[Ri = l,R2= 2] 
^Pr[/?2 = i,Ri =2] =p(2)[/;2/?i]. 

Nor is a quasimetric because it may very well happen that 
Ri ^R2 but 

p(2>[/?i/?2] =Pr[/?i = 1,P2=2]=0. 

To use this p. q. -metric for our purposes: each random variable 
H^a in the hypothetical JDC-vector H has a set of possible values 
J4.a, in which we choose and fix a measurable subset J? J and its 
complement A^a- Note that Aa is the same for all factor points 
of the factor a (and coincides with the spectrum of the random 
variable in the diagram ^ which is supposed to be selectively 
influenced by a). Transform each H^a as 



Rx« 



1 if //j-a e J4^a, 

2 if H^a e + , 



(19) 



and define, for each pair of factor points 



3(2) 



(20) 



Here, once again (see Section [3. 4. lb . we write x^yP in place of 
(x",^^). This time we are going to formalize this notation as 
part the following general convention: any chain (a finite se- 
quence) of factor points will be written as a string of symbols, 
without commas and parentheses, such as x"' . . .x"', x^y^z^, etc. 

The value of Dx"yP is computable for any x^yP which is part 
of a treatment £ T. The test therefore consists in checking 
whether 



Dx^'x;" < Dx^'x^^ - 



- Dx"-x"3 - 



(21) 



for any chain of factor points x"' . . .x"' (/ >3) satisfying ( fT2b and 
such that for some treatments (|)^'\ . . . e T (not necessarily 
pairwise distinct). 



{x«' ,xr' } C C^W , {x«' ,xf}c ^(2) , . . . , {xr!V ,xr' } C ^ 



(0^ 
(22) 

Note that this is just another way of writing (fT3l)-(fT4l). If the 
test is failed (i.e., the inequality is violated) for at least one such 
sequence of factor points, then the hypothesis (Ai,...,A„) *-P 
(ai,...,a„) is rejected. In the following we will refer to any 
sequence of factor points x"' . . .x"' (/ > 3) subject to (fT2l i and 
( 122b as a treatment- realizable chain. 

Example 5.1. Let a = {1",2«}, P = {1^,2^}, and the set of 
allowable treatments T consist of all four possible combinations 
of the factor points. Let (A,Z?) be bivariate normally distributed 
at every treatment (|), with standard normal marginals and with 
correlations 



-.9 


at 


{x-,/} 


= {1MP}, 


+.9 


at 


{x«,.yP} 


= {1«,2P}, 


+.9 


at 




= {2MP}, 


-.1 


at 


{x«,yP} 


= {2«,2P}. 



We form variables 



ifA((^)<0, 
if A((j)) >0, 



B* 




with all marginals obviously constrained to 0.5, for all treat- 
ments. The joint distributions are computed to be 



a p 


A* B* 


Pr 




a p 


A* B* 


Pr 


1 1 


1 1 

1 2 

2 1 
2 2 


.428217 




1 2 


1 1 

1 2 

2 1 
2 2 


.0717831 


a p 


A* B* 


Pr 




a p 


A* B* 


Pr 


2 1 


1 1 

1 2 

2 1 
2 2 


.0717831 




2 2 


1 1 

1 2 

2 1 
2 2 


.265942 
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where for each treatment (|) we only show the probabiHties 
Pr [A* = 1,B* =2]= ^"(2) [A*B*], other probabilities being irrel- 
evant for our computations. Since {l",2P},{2",2P}, 
and {2", iP} are all allowable treatment, l"2P2"lP is a 
treatment-realizable chain. We can put therefore 



'a* (x«,/) B* 

and observe that 

.428217 = >Dl"2P+D2"2P + D2"lP = 0.409508. 

This violation of the chain inequality rules out {A,B) «-P (a, p). 

□ 

The formulation of the test ( |2TI ). subject to (fT2T i and ( l22T i. is 
valid for any p. q. -metric D imposed on the hypothetical JDC- 
vector H for the diagram It turns out, however, that using all 
possible treatment-realizable chains jc"' ...x"' of factor points 
would be redundant, in view of the lemma below. For its for- 
mulation we need an additional concept. A treatment-realizable 



chain x"' . . .x 



a, 



(/ >3) is called irreducible if 



1 . the only nonempty subsets thereof that are subsets of treat- 
ments are the pairs listed in (l22l) . and 

2. no factor point in it occurs more than once. 

Thus, a triadic treatment-realizable chain x'^y^z^ is irreducible 
if and only if there is no treatment (|) that includes {x^jjPjZ^}. 
Tetradic treatment-realizable chains of the form x^y^M^yP are 
irreducible if and only if x" ^ m" and y^ j^v^. 

Theorem 5.2 (Distance-type Tests). Given a p.q.-metric D on 
the hypothetical JDC-vector H for the diagram dS]), the inequal- 
ity f l27l ) is satisfied for all treatment-realizable chains if and only 
if this inequality holds for all irreducible chains. 



This theorem is an immediate consequence of Lemma lA.l II 
in the appendix, where it is proved for a general set-up involving 
arbitrary sets of random entities and factors. 

Note that if T includes all possible combinations of factor 
points, r = ai X . . . X a„, ("completely crossed design"), then 
the condition of treatment-realizability is equivalent to (12\ . In 
this situation any set of factor points belonging to pairwise dif- 
ferent factors (e.g., {x",yP}, or {x",yP,z'>'} with a 7^ (3 ^y^a) 
belongs to some treatment, whence an irreducible chain cannot 
contain factor points of more than two distinct factors: they must 

all be of the formx"x2X3X4...X2j,_jX2j(, (a 7^ p). It is easy to see, 
however, that if > 2, each of the subsets |x",X4| and |x2,X5 | 

belongs to a treatment. It follows that that all irreducible chains 
in a completely crossed design are of the form x°'y^u"'v^, with 
ttT^P, x"7^M« and^T^vP. 

Theorem 5.3 (Distance-type Tests for Completely Crossed De- 
signs). If the set of treatments T consists of all possible combi- 
nations of factor points, then the inequality ( 1271 ) is satisfied for 
all treatment-realizable sequences of factor points if and only 
if this inequality holds for all tetradic sequences of the form 
x^y^M^vP, with a 7^ P, x" 7^ m" and yP ^ yP. 



This formulation is given in Dzhafarov and Kujala (2010), al- 
though there it is unnecessarily confined to metrics of a special 
kind, denoted m'^p^ below. 



5.2. Classes of p.q.-metrics 

Let us consider some classes of p.q.-metrics that can be used 
for distance-type tests. We do not attempt a systematization 
or maximal generality, our goals being to show the reader how 
broad the spectrum of the usable p.q.-metrics is, and how easy it 
is to generate new ones. 



5.2.1. Minkowki-type metrics 

These are (conventional, symmetric) metrics of the type 



lesssupjA — B| for p = 00, 



(23) 



where 



esssup|A-B| =inf{v:Pr[|A-B| < v] = I}. 



In the context of selective influences these metrics have been in- 
troduced in Kujala and Dzhafarov (2008) and further analyzed in 
Dzhafarov and Kujala (2010). The metric M discussed in Sec- 
tion [3AT] is a special case {p = 1). An important property of 
M*^''' is that the result of an M^''^ -based distance-type test is not 
invariant with respect to factor-point-specific transformations of 
the random variables. This allows one to conduct an infinity of 
different tests on one and the same A((|)) = (Ai, . . . ,A„)((|)). For 
numerous examples of how the test works see Kujala and Dzha- 
farov (2008) and Dzhafarov and Kujala (2010). 



5.2.2. Classification p.q.-metrics 

Classification p.q.-metrics are the p.q.-metrics defined 
through the p.q.-metric P'^* by ( |20l i. following a transformation 
( fT9] l. The general definition is that for each random variable X 
in a set of jointly distributed random variables we designate 
two complementary events and E^, and put 



Dc(A,B) = Pr[£^-&£+] 



The results of a Dp-based distance-type test for selective in- 
fluences depend on the choice of the events E^, so differ- 
ent choices would lead to different tests for one and the same 
A ((])) = (Ai ,.. . ,A„)((|)). See Example |57T] for an iflustration. 

To the best of our knowledge this interesting p.q.-metric was 
not previously considered in mathematics. One standard way 
to generalize it (see the principles of constructing derivative 
metrics in Section 15.2.41 below) is to make the set of events 
{E^ : X G } a random entity. In the special case when all ran- 
dom variables in take their values in the set of real numbers, 
and E^ for each X e ^ is defined by X > v, the "randomization" 
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of \^E^ : X e ?(, } reduces to that of v. The p. q. -metric then be- 
comes 

Ds{A,B) = Pr[A<V <B] 

where y is a random variable. An additively symmetrized (i.e., 
pseudometric) version of this p. q. -metric, Ds +£>s {B,A), 
was introduced in Taylor (1984, 1985) under the name "separa- 
tion (pseudo)metric," and shown to be a conventional metric if 
V is chosen stochastically independent of all random variables 
in 



5.2.3. Information-based p. q. -metric 

Let the jointly distributed random variables constituting the 
set !{_ be all discrete. Perhaps the simplest information-based 
p. q. -metric is 

with the conventions Olog^ = OlogO = 0. is This function is 
called conditional entropy. The identity h (A|A) = is obvious, 
and the triangle inequality, 

h{A\B) <h{A\C)+h(C\B), 

follows from the standard information theory (in)equalities, 

h{A\B) <h{A,C\B), 

h{A,C\B) ^ h{A\C,B) + h{C\B) , 



and 



h{A\C,B)<h{A\C). 



Note that the test of selectiveness based on h{A,B) (and any 
other information-based measure) is invariant with respect to all 
bijective transformations of the variables. 

The additively symmetrized (i.e., pseudometric) version of 
this p. q. -metric, h{A\B) + h{B\A) is well-known (Cover & 
Thomas, 1990). NormaUzed versions of li {A\B) are also of in- 
terest, for instance, 



%(A|B) = 



2h{A\B) 
h{A,B) '■ 



where 



h{A,B) = - Y^PAB ia,b)logpAB{a,b), 



a.b 



the joint entropy of A and B; {A\B) is bound between (at- 
tained when A is a bijective transformation of B) and 1 (when A 
and B are independent). A proof of the triangle inequality for 
can be found in Kraskov et al. (2003), as part of their proof that 
2 [% {A\B) + /jyv (fi|A)] is a pseudometric. 



5.2.4. Constructing p.q. -metrics from other p.q.-metrics 

There are numerous ways of creating new p.q.-metrics from 
the ones mentioned above, or from ones taken from outside 
probabilistic context. Thus, if is a p.q. -metric on a set S, then, 
for any space ?(, of jointly distributed random variables taking 
their values in S, 

D{A,B)^E[d{A,B)], A,B(^1i, 

is a p.q. -metric on 3^ . This follows from the fact that expectation 
E preserves inequalities and equalities identically satisfied for 
all possible realizations of the arguments. Thus, the distance 
M(A,B) =E[|A-B|] of Section [3AT] trivially obtains from the 
metric d (a,b) = \a — b\ on reals. In the same way one obtains 
the well-known Frechet distance 



F{A,B)=E 



\A-B\ 



1 + |A-B| 



Below we present an incomplete list of transformations 
which, given a p.q.-metric (quasimetric, pseudometric, conven- 
tional metric) D on a space 3^ of jointly distributed random 
variables produces a new p.q.-metric (respectively, quasimetric, 
pseudometric, or conventional metric) on the same space. The 
proofs are trivial or well-known, so we omit them. The arrows 
=^ should be read "can be transformed into." 

1. D D'' (q < 1). In this way, for example, we can obtain 
metrics 



'(E[|A-B|''])'?/'' forl</9<oc,^<l 
(esssupjA — for p = oo^fjr < 1 

from the metrics M^'^' in i 



M^P'i^{A,B) 



2. D D/ {I +D). This is a standard way of creating a 
bounded p.q.-metric. 

3. Di,D2 =^ max{Di,D2} or Di,D2 =^ Di +D2. This 
transformations can be used to symmetrize p.q.-metrics: 
D{A,B)+D{B,A) ormax {D {A, B),D{B, A)}. 

4. A generalization of the previous: {Dx, : D G T} 
sup{Du} and {D^ : 1) G T} E[Dy], where 
{Dx, : D G T} is a family of p.q.-metrics, and V des- 
ignates a random entity distributed as (T,ET,m), so 
that 



D{A,B) 



D^(A,B)dm{v) 



We have discussed in Section |5".2.2l how such a procedure 
leads from our "classification" p.q.-metrics Dc to "sepa- 
ration" p.q.-metrics Ds. 



6. NON-DISTANCE TESTS 

The general principle of constructing tests for selective influ- 
ences presented in Section [J.4. 11 does not only lead to distance- 
type tests. In this section we will consider two examples, one 
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proposed previously and one new, of tests in which the func- 
tionals g mentioned in Section [3AT] are, respectively, two- 
argument but not distance-type, and multiple-argument ones. 
Recall that the tests in question are only necessary conditions 
for selective influences (in the form of the diagram[8]l. 



6.1. Cosphericity test 

Given a hypothetical JDC-vector 



H = H ai , . . . ,// a; , . . . ,// a„ , . . . ,// a„ 

> ^1 -^i'l 1 «■« 



with real-valued random variables, the following statement 
& should be satisfied: for any quadruple of factor points 
{jii:",yP,M", yP} with (X 7^ P such that for some treatments 
(|)i,(j)2,(^3,(|)4 e T, 

we have 



<^/i-P^,Pv/i-P.k 



l-p'a,,pA/l-p' 



•'„a,,p> 



where p^.a,,p denotes the correlation between H^a and H^.f,, p^-a„p 
denotes the correlation between H^a and H^^, etc. Ergo, if the 
inequaUty is violated for at least one such a quadruple of factor 
points, the JDC-vector cannot exist, and the diagram of selective 
influences [8] should be rejected. For numerous illustrations see 
Kujala and Dzhafarov (2008), where this test has been proposed, 
and where it is also shown that for two bivariate normally dis- 
tributed variables in a 2 x 2 factorial design this test is both a 
necessary and sufficient condition for selective influences. 



6.2. Diversity Test 

The p. q. -metric P^^^ introduced in Section |5] lends itself to an 
interesting generalization. Let 2^ be a set of jointly distributed 
random variables, each having {1,2, ... ,5} as its set of possible 
values. Define 



= Pr 



R 



R 



R 



i, for j — l,...,ki and / = 1 , 



In particular. 



Pr 



R^ = h 



,R.=s 



-P^'H{Ri)...{Rs)]. 



It is easy to show that the latter is a generalized p. q. -distance, 
in the sense of satisfying the following two properties: for any 

1. (generalized premetric) P'*' [(^1) • • • (Rs)] is nonnegative, 
and it is zero if any two of Ri,... ,Rs are identical. 



2. (simplicial inequality): 

P^'H{Ri)---{Rs)]<LliP^'H{Ri)---{R)---{Rs)], 

where in the /th summand on the right, in the sequence 
{Ri)...{Ri)... (R,) is replaced with R {i = 1,. . . ,s), the 
rest of the sequence remaining intact.'* 

The generalized premetric property is obvious. To avoid cum- 
bersome notation, let us prove the simplicial inequality for s ~ 
3, the generalization to arbitrary s being straightforward. We 
drop in f the parentheses around sing letons: [/?i/?2/?3], 
P^^^ [Ri {R2,R)R3], etc. The simplicial inequality in question is 



p(^) [R1R2R3] < f"'^' [RR2R3]+P'-^^ [RiRR^j+P^^^ [R1R2R] ■ 
We have 

[R1R2R3] 

= pP) [(^R^^R)R2R3] [Ri {R2,R)R3] +P^^^ [R1R2 {R3,R)] , 

[RR2Ri] 

= PW [{R^^R)R2R^] +p(3) [R{R,^R2)R^] [RR^ {RuRi}] , 

and analogously for p(^) [R1RR3] and [RiR2R]- Then 

[RR2Ri] [RiRRj^] +p(3) [RiR2R] -P'^) [RiR^R^] 

= f(3) [R{Rl,R2)R3] [RR2 (RuRi)] 

[{Ri,R2)RRi] [RiR{R2,R3)] 
[{Rl,R3)R2R] [^1 iR2,R3)R] > 0. 

We call P^'*^ a diversity function. To use this function for a 
test of selective influences, for each random variable H^a in the 
hypothetical JDC-vector H we partition the set of its possible 
values J?;i-a into s pairwise disjoint subsets -J^^'a, . . . ,.J?Ja, and we 
transform //,a as 



R^a = 



1 if H,a e a^l^, 



S if //vo <^ ^la- 



Define 



R j'i . ..R ps 



With the addition of permutation-invariance, functions ' -4 K (with ^ 
an arbitrary set) satisfying these properties are sometimes called (i— 1)- 
semimetrics (Deza & Rosenberg, 2000); with the addition of the property that 
P''' > if no two arguments thereof are equal, they become [s — l)-metrics. 
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Let us restrict the consideration to ^ = 3 again. Assuming all 
factor points mentioned below belong to |J <t>, and given a triadic 
chain of factor points t ~ x'^y^z^ (with the elements pairwise 
distinct), we define a certain set of triadic chains referred to as a 
polyhedral set over t. 

1 . For any triadic chain t = x'^y^z^ (x" ^y^ ^ ^ x") and 
any u^' ^ {x",yP,z^}, the set {u^'yh~',x'^u^'z~',x'^y^u^'} is a 
polyhedral set over t; 

2. For any triadic chains t and r', if *p is a polyhedral set over 
t, and Cp' is a polyhedral set over any t' G then the set 
(<P - {f'}) U *P' is a polyhedral set over f . 

3. Any polyhedral set over any triadic chain f is obtained by 
a finite number of applications of 1 and 2 above. 

We call such a set polyhedral because if one interprets each el- 
ement of it as a list of vertices forming a (triangular) face, then 
the whole set, combined with the root face f, forms a complete 
polyhedron. 

A polyhedral set ^ over t ~ x"'y^z^ is called treatment- 
realizable if each element (triadic chain) that belongs to *pu {f } 
consists of elements of some treatment (|) € T (which implies, 
in particular, a 7^ p 7^ y 7^ a). The diversity test for selective 
influences consists in checking the compliance of the hypotheti- 
cal JDC-vector with the following statement: for any treatment- 
realizable polyhedral set overx^'jCj^Xj', 



(24) 



The inequality trivially follows from the simplicial inequality 
and the definition of *p. 

The classification p. q. -metric tests considered earlier form a 
special case of the diversity tests. For complete analogy one 
should replace chains in the formulation of the f'^' -based tests 
with a polygonal set *P of pairs of factor points (dipoles) over 
a given dipole d = x'''y^ (.x" 7^ yP). This set is defined as a set 
obtainable by repeated applications of the following two rules: 

1. for any d = x^'y^ (jc" 7^ yP) and any ^ the set 
{M'"yP,x"M'"} is a polygonal set over d; 

2. if *p is a polygonal set over d, and *p' is a polygonal set 
over any d' G *p, then the set (*p — {d'}) U*P' is a polygo- 
nal set over d. 

The generahzation to > 3 involves polytopal sets of i-element 
chains and is conceptually straightforward. The notion of an 
irreducible chain is also generalizable to polytopal sets, but we 
are not going to discuss this and related issues here: the diversity 
function and diversity-based tests form a rich topic that deserves 
a special investigation. 

Example 6.1. Let a, p,y,8 be binary (1/2) factors, and let the 
set of allowable treatments T consist of all combinations of the 
factor points subject to the following constraint: |l",lP,2'', 1^} 
is the only treatment in T of the forms lP,2^,v^}, 
{l«,lP,v^,lS}, {l«,vP,2T,lS}, and {v", lP,2T', 1^}. Let 



the random variables A,B,C,D in the hypothetical diagram 
(A,Z?,C,D) ^ (a,p,y,8) each have three values, denoted 1,2,3, 
and let the distributions of {A,B,C,D) be as shown in the tables, 
with all omitted joint probabilities being zero: 



a p y 8 


A B C D 


Pr 


X y z u 








12 3 1 


1/3 




12 3 2 


1/3 




12 3 3 


1/3 



a P y 8 


A B C D 


Pr 


112 1 








12 3 1 


1/2 




12 3 2 


1/2 




12 3 3 






where {x"',y^,z^,u'} is any treatment in T other than 
1 1", lP,2^, n}. It is easy to check that the 3-marginals 
(hence also all lower-order marginals) of the distributions 
satisfy marginal selectivity. One can also check that 
l^a^p^e ^aiYiS iPi7i8| ^ polyhedral set (in fact, the sim- 
plest one, forming a tetrahedron with vertices 1", iP, 1^). 
This polyhedral set is treatment-realizable, because 

{l",lP,n} C {l«,lP,n,2S}, {l«,lP,l5} C {l«,lP,2T l5}, 

{1",1T,1^} C {1",2P,1T {lP,lT,l5} c {2«,lP,n,lS}. 

Putting 

£)l«lPlT^ = p(3) [//jaZ/jp/Zn] 

= Pr[{A=: l,B = 2,C = 3}(l",lP,n,28)] = 1, 

Dl"lPl^=P(3) [//laZ/jpi/js] 

= Pr [{A = 1,B = 2,D = 3} (l", 1^,2^, l^)] = 0, 



Dl"l^lT' = pW [//laZ/jg/Zn] 
= Pr [{A = l,D = 2,C^ 3} (l",2P, 1^^, 1^)] 

£,l8jPjY = p(3) [i/j5//jp//i,] 

= Pr[{Z)= 1,B = 2,C = 3} (2«,lP, 11^,18)] = 



1 

3' 



where H^t' are elements of the hypothetical JDC-vector, we see 
that the simplicial inequality is violated: 

1 =Dl«lPlT>Dl"lPlS+Dl"lSlT + DlSlPlT= -. 

3 

This rules out the possibility of {A,B,C,D) ^ (a,p,y,8). □ 



7. CONCLUSION 

Selectiveness in the influences exerted by a set of inputs upon 
a set of random and stochastically interdependent outputs is a 
critical feature of many psychological models, often built into 
the very language of these models. We speak of an internal rep- 
resentation of a given stimulus, as separate from an internal rep- 
resentation of another stimulus, even if these representations are 
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considered random entities and they are not independent. We 
speak of decompositions of response time into signal-dependent 
and signal-independent components, or into a perceptual stage 
(influenced by stimuli) and a memory-search stage (influenced 
by the number of memorized items), without necessarily assum- 
ing that the two components or stages are stochastically inde- 
pendent. Moreover, the same as with theory of measurement 
and model selection studies, the issue of selective probabilistic 
influences, while born within psychology and motivated by psy- 
chological theorizing, pertains in fact to any area of empirical 
science dealing with inputs and random outputs. 

In this paper, we have described the fundamental Joint Dis- 
tribution Criterion for selective influences, and proposed a di- 
rect application of this criterion to random variables with finite 
numbers of values, the Linear Feasibility Test for selective influ- 
ences. This test can be performed by means of standard linear 
programming. Due to the fact that any random output can be 
discretized, the Linear Feasibility Test is universally applicable, 
although one should keep in mind that if a diagram of selective 
influences is upheld by the test at some discretization, it may 
be rejected at a finer or non-nested discretization (but not at a 
coarser one). 

Based on the Joint Distribution Criterion we have also formu- 
lated a general scheme for constructing various necessary con- 
ditions (tests) for selective influences. Among the tests thus 
generated is a wide spectrum of distance-type tests and some 
other tests described in the paper. The results of some of these 
tests (e.g., all those involving expected values) are not invariant 
with respect to factor-point-specific transformations of the ran- 
dom outputs, which allows one to expand each of such tests into 
an infinity of different tests for different transformations. 

The abundance of different tests which we now have at our 
disposal poses new problems. The Linear Feasibility Test is su- 
perior to other tests as it allows one to prove (rather than only 



disprove) the adherence of a system of inputs and outputs to a 
given diagram of selective influences (for a given discretization, 
if one is involved). It is possible, however, that discretization 
is not desirable, or the size of the problem is too large to be 
handled by available computational methods. In these cases one 
faces the problem of devising an optimal, or at least systematic 
way of applying a sequence of different necessary conditions, 
such as distance-type tests. Let us call a test Ti stronger than 
test T2 with respect to a given diagram of selective influences if 
the latter cannot be upheld by Ti and rejected by T2, while the 
reverse is possible. Thus, in Kujala and Dzhafarov (2008) it is 
shown that the cosphericity test (Section|6?TJ is stronger than the 
Minkowski distance test with p = 2 (Section 15.2.11 ). We know 
very little, however, about the comparative strengths of different 
tests on a broader scale. 

The problem of devising optimal strategies of sequential test- 
ing arises also within the confines a particular class of tests. 
Thus, the classification test (Sections 15 . 1 1 and [5 .2.21 ) and the di- 
versity test (Section [6.2b can be used repeatedly, each time with 
a different choice of the partitions of the random outputs' do- 
mains. We do not know at present how to organize the sequences 
of these choices optimally. In the case of the Minkowski distance 
test we do not know in which order one should use different val- 
ues of p and different factor-point-specific transformations of 
the random variables. The latter also applies to the nonlinear 
transformations in the cosphericity test. 

Finally, adaptation of the population-level tests to data anal- 
ysis is another problem to be addressed by future research. Al- 
though sample-level procedures corresponding to our tests seem 
conceptually straightforward (Section[3A2]i, the issues of statis- 
tical power and statistical interdependence compound the prob- 
lems of comparative strength of the tests and optimal strategy of 
sequential testing. 



Appendix A: GENERALIZATIONS TO ARBITRARY SETS 



Random Entities and Variables 

For the purposes of this paper it is convenient to view a ran- 
dom entity A as a quadruple ('A', ,E,/j), where 'A' is a unique 
name, J? is a nonempty set (of values of A), E is a sigma al- 
gebra of subsets of a (called measurable subsets), and /j is a 
probability measure on E with the interpretation that /J (a) for 
any o e E is the probability with which A falls within o C ^ . 
(.^?,E) is referred to as the observation space for A. We call 
the probability space (j? ,E,/j) the distribution for A and say that 
A is distributed as (j?,E,/j). The inclusion of the label 'A' is 
needed to ensure an unlimited collection of distinct random en- 
tities with the same distribution. If two random entities A and A' 
have the same distribution, we write A ^ A'. If A and B are dis- 
tributed as, respectively, {j^,!.^ and (S ,Eg ,v), then we say 
B ^ /(A) if / : ^ « is such that b G E^ implies /"^ (b) e E^ 
and V (b) = /j (/^' v being referred to as the induced mea- 
sure (with respect to fJ,f), and the function / being said to be 
(j?,Ejj — {'S ,'L,g ,v)-measurable. 

With any indexed set of random entities {Amlf^g^ each of 
which is distributed as (;?co,^(o,A'co). (0 G ii, we associate its 



"natural" observation space {si,Y), with A = IltoG^-^oo (Carte- 
sian product) and E = 0(ogn being the smallest sigma alge- 
bra containing all sets of the form am x IliGa-jco} ifo € ^co- 
We say that the random entities in {Afolf^gQ possess a joint dis- 
tribution if {Acoj^gQ is a random entity distributed as (.^,E,/j) 
with n (aco X niGn-{a)} -^0 = ("co (am) • Every subset Q! cCl pos- 
sesses a marginal distribution (IlcoGa' ^<s>^ ^men' ^(0,/^')' where 
A*' (a) = A* (a X Wiea-a' ^i), for all a e ^^^en' ^m- 



' The standard definition of a random entity (also called "random element" or 
simply "random variable") is a measurable function from a sample space 
to an observation space. The present terminology can be reconciled with 
this view by considering ({'A'} x ,{{'A'} x n : o G £} ,v) a sample space, 
(j? ,Z) an observation space, and A the projection function {'A'} x A.ln 
the case of jointly distributed random entities, A = {Aa}g,^Q, each of them, 
with an observation space can be defined as the projection function 

{'A'} X — > We do not, however, assume a common sample space for 
all random entities being considered. The notion of a sample space is a source 
of conceptual confusions, the chief one being the notion that there is only one 
sample space "in this universe," so that any set of random entities possesses a 
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Remark A.l. Note that the elements of the Cartesian prod- 
uct rimGn-^o) ^6 choice functions Q. — > UmGn-^to, that is, they 
are sets of pairs of the form {(0,a), co e H, a G A^- This 
means that the indexation of {Acolf^gj^ is part of the identity of 
A ~ ritoGg) •'^co, hence also of the distribution of A = {^tolmen- 
Ideally, only the "ordinal structure" of the indexing set i2 should 
matter, and this can be ensured by agreeing that £2 is always 
an initial segment of the class of ordinal numbers. With these 
conventions in mind, {Ao)}(j,gQ can be viewed as generalizing 
the notion of a finite vector (although it is convenient not to 
complicate notation to reflect this fact). For sets of jointly dis- 
tributed and identically indexed random entities, the relation 
{^tolmgn ^ {^t»>}a)Gn should always be understood in the sense 
of "corresponding indices," implying, in particular, {Amlj^gQ/ ~ 
{B(c}aeQ! ^'^^ subset O! of H. 

The equality A\ = A2 in the present context means that the 
two random entities have a common observation space 
and that {A \ , A2} is a jointly distributed random entity with mea- 
sure /J such that ({(fli ,02) e X : fli = 02}) = 1 (this corre- 
sponds to the equality "almost surely" in the traditional terminol- 
ogy). We also follow the common practice of using equality to 
replace "is" or "denotes" in definitions and abbreviations, such 
as A = {A(o}(j,gQ. The two meanings of equality are easily dis- 
tinguished by context. 

A random variable is a special case of random entity. Its def- 
inition can be given as follows: (i) if A is countable, E is the 
power set of A, then a random entity distributed as 
is a random variable; (ii) if Si is an interval of reals, E is the 
Lebesgue sigma-algebra on J? , then a random entity distributed 
as (i? ,Il,ju) is a random variable; (iii) any jointly distributed vec- 
tor (Ai, . . . ,A„) with all components random variables is a ran- 
dom variable. The notion thus defined is more general than in the 
main text, but the theory presented there applies with no modifi- 
cations. 

Lemma A.l. A set {Acol^jgQ of random entities possesses a 
joint distribution if and only if there is a random entity R dis- 
tributed as a probability space (:^,Z5^,v) and some functions 
{/(o : -^tolojen' that {A^}^^^ = {/« (^)}foGn- 

Proof. (Note that the formulation implies that all the functions 
involved are appropriately measurable.) To show sufficiency, 
observe that the induced measure jU of any set of the form 
ncoGwCim X riiGa-Af-^^i' where is a finite subset of £2 and 
0(0 G ^(0 for CO G A^, is V (fltoGA' /m ' ('^(o)). and this measure is 
uniquely extended to ^mGn^d)- To show necessity, put R = 
{A(o : CO e £2} and, for every co e £2, define fa'- ^ to be 
the (obviously measurable) projection f^ : Hiea ^ ^a- 

Corollary A.3. If H is finite and {A([,}^^q is a set of random 
variables, then R in Lemma \A.2\ can be chosen to be a random 
variable. Moreover, R can be chosen arbitrarily, as any contin- 
uously (atomlessly) distributed random variable (e.g., uniformly 
distributed between and 1 ). 



joint distribution. 



Proof. The first statement follows from the fact that R = 
{^toltoGfl '■^^ necessity part of Lemma IA.2I is then a ran- 
dom variable. The second statement follows from Theorem 1 
in Dzhafarov & Gluhovsky, 2006, based on a general result for 
standard Borel spaces (e.g., in Kechris, 1995, p. 1 16). □ 



Selective influences and JDC 

A factor is defined as a nonempty set of factor points with 
a unique name: the notation used is x" = {x, 'a'}. Let <I> be a 
nonempty set of factors, and let T C H "J* be a nonempty set of 
treatments. Note that any treatment (|) G T is a function (|) : <t> — > 
(J'P, so (|) (a) denotes the factor point x" of the factor a which 
belongs to the treatment (|). (The notation for (|) (a) used in the 
main text is (j){a}-) 

Let £2 be an indexing set for a set of random entities {/^tolcoeo- 
A diagram of selective influences is a mapping M : £2 -> 2*. For 
any such a diagram one can redefine the set of factors and the set 
of treatments in the following way. For every CO S £2, put 

co* = {i"' :ien^(co)}, 

if M(co) is nonempty; if it is empty, put CO* = {0"*}. This 

establishes the bijective mapping M* : £2 2* , where <t>* = 
{co*}(j,gQ. For each treatment (|) e T we define the correspond- 
ing treatment (|)* as {i"' : i C (j)A i e Y\M* (co) ,co G £2}. The 
set of all such treatments (|)* is denoted T*. (In the main text the 
procedure just described is called canonical rearrangement.) In 
the following we omit asterisks and simply put <I> = £2, replacing 
M : £2 ^ 2* with the identity map M : £2 ^ 4>. 

Among several equivalent definitions of selective influences 
we choose here the one most immediately prompting the Joint 
Distribution Criterion (JDC). 

Definition A.4. Let A = {A<j,}^g^, and A<j, = {A(^,a}^g^ for 
every G T. Let T be a set of treatments associated with 
a set of factors <I>. Let A,|, ci for each a,(|) be distributed as 
{A^{a)l'^|^^a),^^^,a)■ We say that each A^.a is selectively influ- 
enced fey a (a € 4>, (|) € T), and write schematically A ^ 4>, if 
there is a random entity R distributed as (:^,i;j>^,v) and some 
functions {fj,a : ?^ ^ -^A«Lag|j<j, such that A,|, = {^ij).a}„g$ ^ 

Remark A.5. Note that the formulation implies that all the 
functions involved are appropriately measurable. Also, in 
{fx" ■ '-K. ^ Ax^}xa£(j<p the set U'l' can be replaced with 
|J(|,g7'ag(i)(j) (oc) if the latter is a proper subset of \J(S> (and the 
same applies to the definition of H in the theorem below). We 
assume, however, that factor points never used in treatments can 
simply be deleted from the factors. 

Remark A.6. In the main text we assume that (;?^(ci),E,j,(„)) = 
{Aa,T.a), that is, the observation space (j?a,Ea) of the entity 
A^.a is the same across different treatments E T. In footnote 
|6]we mention that this constraint is not essential, as the random 
entities A,|,ci can always be redefined to force = 
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(^c(,Ea) without affecting selective influence. This redefinition 
can be done in a variety of ways, the simplest one being to put 

^a=\J {(j)(a)} X 

and let be the smallest sigma-algebra containing 

{{(|)(a)} X a : a e H^^a), <^ e T}. Define g^(o,) : A^(a) ^ 
by g(t)(a)(«) = for a e J4^(^a)A £ T'," G 4>. Then 

^5 a = g<^(a) {A^.a) and a; = {^J^a}^^^ ^re the rede- 
fined random entities sought. Note that if A <t>, then 
A* = i Al l <-P 4>, because Definition |A.4| applies to A* with 

the same R and with the composite functions gxa o replacing 
fx"-, for all x" S U'S'- (In the terminology of the main text, gx^ 
are factor-point-specific transformations.) 

Theorem A.7 (JDC). A necessary and sufficient condition for 
A <t> in Definition \A.4\ is the existence of a set of jointly dis- 
tributed random entities 

H = {//v«}yxg|j<i, 

f one random entity for each factor point of each factor), such 
that 

for every treatment (|) G T. 

Proof. Immediately follows from the definition and Lemma lA!2] 

□ 

Theorem A.8. /f I 14> in Definition \A.4\ is a finite set and 
is a random variable for every a, (|), then R can always be chosen 
to be a random variable. Moreover, R can be chosen arbitrarily, 
as any continuously ( atomlessly) distributed random variable. 

Proof. Immediately follows from JDC and Corollary |A3] □ 

Remark A.9. In Dzhafarov and Gluhovsky (2006) this inference 
was not made because JDC at that time was not explicitly for- 
mulated (outside quantum mechanics, see footnotes [TTI and [T3]) . 

The three basic properties of selective influences listed in Sec- 
tion 13.31 trivially generalize to arbitrary sets of factors and ran- 
dom entities. 



Distance-type tests 

The principles of test construction (Section[33|i and the logic 
of the distance-type tests in particular, apply without changes 
to arbitrary sets of factors. As to the random entities, some of 
the test measures are confined to discrete and/or real-valued vari- 
ables (e.g., information-based and Minkowski-type ones), others 
(such as classification measures) are completely general. 

We will use the notation and terminology adopted in Dzha- 
farov and Kujala (2010). Chains of factor points can be denoted 
by capital Roman letters, X = x"' ...x"' . A subsequence of 
points belonging to a chain forms its subchain. A concatenation 
of two chains X and Y is written as XY. So, we can have chains 



x"XyP, x"XFyP, etc. The number of points in a chain X is its car- 
dinality, \X\. For any treatment-reaUzable chain X = x"' . . .x"', 
we write 

z-i 

£)X = 520x"'x"'+i 

/=i 

(with the understanding that the sum is zero if Z is or 1). 

A treatment-realizable chain u^'Xx'^ is called compliant (with 
the chain inequality) if Du^'v" < Duf'Xv" = Dx^'x"' + DX + 
Dx""x^; it is called contravening (the chain inequality) if 
Du^'v'^ > Duf^Xv^. The proofs of the two lemmas below are very 
similar, but it is convenient to keep them separate. 

Lemma A.IO. If a treatment- realizable chain Xq = x"' ...x"' 

(I >3) is contravening, then it contains a contravening subchain 
in which no factor point occurs more than once. 

Proof. If / = 3 then the chain contains no factor point more than 
once, because otherwise it is not treatment-realizable. If / > 3, 
and Xq contains factor points x"' = Xj ' , then it can be presented 

as Xq = x"' . . .x'^'Ux"j' . . .x"', where U is some nonempty sub- 
chain (; may coincide with I or j coincide with /, but not both). 
But then X\ = x"' . . .x"' . . .x"' is also treatment-realizable and 
contravening, because 

Dx^'x;*' > DXo = Dx^' . . .xf'f/xj^ . . .x"^' 
>Dx^'...xf'...xf'=DXi. 

If Xi contains two equal factor points, then 3 < \X\ \ < \Xq\, and 
we can repeat the same procedure to obtain X2, etc. As the proce- 
dure has to stop at some X,, this subchain will contain no factor 
point twice. □ 

Lemma A.ll. If a treatment-realizable chain Xq = x"' .--x"' 

(I >3) is contravening, then it contains a contravening irre- 
ducible subchain. 

Proof. By the previous lemma, we can assume that every factor 
point in Xq occurs no more than once. If / = 3, the chain Xq itself 
is irreducible, because otherwise there would exist a treatment 
E T that includes the elements of the chain, and this would 
make the chain compliant. If / > 3, and the chain Xq is not irre- 
ducible, then it must contain a subchain xf'x/ such that /' > / + 1 

and |x"',xj'| is part of some treatment (|) e T. The chain then 

can be presented as Xq = x"' . . .x"'t/xj^ . . .x"', where U is some 
nonempty subchain (; may coincide with 1 or j with /, but not 
both). The subchain x"' [/x^ ^ is clearly treatment-realizable. If 
it is contravening, then we replace Xq with Xi = x"'C/xJ^ ; if it 

is compliant, then we replace Xq with Xi = x"' . . -x^'xj^ . . .x"'. 
In both cases we obtain a treatment-realizable subchain Xi of Xq 
such that 3 < |Xi I < \Xo\, and Xi is contravening; in the former 
case Xi —x'^'Uxj^ is contravening by construction, in the latter 
case Dxf'f/x"^ > Dxf'x"-' whence 

Dx'^'x'^' > DXo = Dx^' . . .xf'C/x"^ . . .x'^' 
>Dx5^'...xf'x"^..xf' 
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If Xi is not irreducible, we can apply the same procedure to Xi 
to obtain a contravening subchain X2 with 3 < IX2I < \Xi |, and 
continue in this manner Eventually we have to reach a contra- 
vening subchain X, of Xq such that \Xf \ > 3 and the procedure 
cannot continue, indicating that Xt is irreducible. □ 
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